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РКЕЕАСЕ 


The Purpose and Prerequisites of this Book 


Mathematical Statistics with Applications was written for use with an undergraduate 
1-year sequence of courses (9 quarter- or 6 semester-hours) on mathematical statistics. 
The intent of the text is to present a solid undergraduate foundation in statistical 
theory while providing an indication of the relevance and importance of the theory 
in solving practical problems in the real world. We think a course of this type is 
suitable for most undergraduate disciplines, including mathematics, where contact 
with applications may provide a refreshing and motivating experience. The only 
mathematical prerequisite is a thorough knowledge of first-year college calculus— 
including sums of infinite series, differentiation, and single and double integration. 


Our Approach 


Talking with students taking or having completed a beginning course in mathematical 
statistics reveals a major flaw in many courses. Students can take the course and leave 
it without a clear understanding of the nature of statistics. Many see the theory as a 
collection of topics, weakly or strongly related, but fail to see that statistics is a theory 
of information with inference as its goal. Further, they may leave the course without 
an understanding of the important role played by statistics in scientific investigations. 

These considerations led us to develop a text that differs from others in three ways: 


e First, the presentation of probability is preceded by a clear statement of the 
objective of statistics—statistical inference—and its role in scientific research. 
As students proceed through the theory of probability (Chapters 2 through 7), 
they are reminded frequently of the role that major topics play in statistical 
inference. The cumulative effect is that statistical inference is the dominating 
theme of the course. 


* The second feature of the text is connectivity. We explain not only how major 
topics play a role in statistical inference, but also how the topics are related to 
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one another. These integrating discussions appear most frequently in chapter 
introductions and conclusions. 


* Finally, the text is unique in its practical emphasis, both in exercises throughout 
the text and in the useful statistical methodological topics contained in Chap- 
ters 11-15, whose goal is to reinforce the elementary but sound theoretical 
foundation developed in the initial chapters. 


The book can be used in a variety of ways and adapted to the tastes of students and 
instructors. The difficulty of the material can be increased or decreased by controlling 
the assignment of exercises, by eliminating some topics, and by varying the amount of 
time devoted to each topic. A stronger applied flavor can be added by the elimination 
of some topics—for example, some sections of Chapters 6 and 7—and by devoting 
more time to the applied chapters at the end. 


Changes in the Seventh Edition 


Many students are visual learners who can profit from visual reinforcement of con- 
cepts and results. New to this edition is the inclusion of computer applets, all available 
for on line use at the Thomson website, www.thomsonedu.com/statistics/wackerly. 
Some of these applets are used to demonstrate statistical concepts, other applets 
permit users to assess the impact of parameter choices on the shapes of density 
functions, and the remainder of applets can be used to find exact probabilities and 
quantiles associated with gamma-, beta-, normal-, x^^ t-, and F-distributed random 
variables—information of importance when constructing confidence intervals or per- 
forming tests of hypotheses. Some of the applets provide information available via 
the use of other software. Notably, the R language and environment for statistical 
computation and graphics (available free at http://www.r-project.org/) can be used to 
provide the quantiles and probabilities associated with the discrete and continuous 
distributions previously mentioned. The appropriate R commands are given in the 
respective sections of Chapters 3 and 4. The advantage of the applets is that they are 
“point and shoot,” provide accompanying graphics, and are considerably easier to 
use. However, R is vastly more powerful than the applets and can be used for many 
other statistical purposes. We leave other applications of R to the interested user or 
instructor. 

Chapter 2 introduces the first applet, Bayes' Rule as a Tree, a demonstration that 
allows users to see why sometimes surprising results occur when Bayes' ruleis applied 
(see Figure 1). As in the sixth edition, maximum-likelihood estimates are introduced in 
Chapter 3 via examples for the estimates of the parameters of the binomial, geometric, 
and negative binomial distributions based on specific observed numerical values of 
random variables that possess these distributions. Follow-up problems at the end of 
the respective sections expand on these examples. 

In Chapter 4, the applet Normal Probabilities is used to compute the probability 
that any user-specified, normally distributed random variable falls in any specified 
interval. It also provides a graph of the selected normal density function and a visual 
reinforcement of the fact that probabilities associated with any normally distributed 


FIGURE 1 
Applet illustration of 
Bayes’ rule 


FIGURE 2 
Applet comparison of 
three beta densities 
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random variable are equivalent to probabilities associated with the standard normal 
distribution. The applet Normal Probabilities (One Tail) provides upper-tail areas as- 
sociated with any user-specified, normal distribution and can also be used to establish 
the value that cuts off a user-specified area in the upper tail for any normally distributed 
random variable. Probabilities and quantiles associated with standard normal random 
variables are obtained by selecting the parameter values mean = 0 and standard de- 
viation = 1. The beta and gamma distributions are more thoroughly explored in this 
chapter. Users can simultaneously graph three gamma (or beta) densities (all with user 
selected parameter values) and assess the impact that the parameter values have on 
the shapes of gamma (or beta) density functions (see Figure 2). This is accomplished 
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using the applets Comparison of Gamma Density Functions and Comparison of 
Beta Density Functions, respectively. Probabilities and quantiles associated with 
gamma- and beta-distributed random variables are obtained using the applets Gamma 
Probabilities and Quantiles or Beta Probabilities and Quantiles. Sets of Applet Ex- 
ercises are provided to guide the user to discover interesting and informative re- 
sults associated with normal-, beta-, and gamma- (including exponential and x?) 
distributed random variables. We maintain emphasis on the x? distribution, including 
some theoretical results that are useful in the subsequent development of the ¢ and F 
distributions. 

In Chapter 5, it is made clear that conditional densities are undefined for values of 
the conditioning variable where the marginal density is zero. We have also retained 
the discussion of the "conditional variance" and its use in finding the variance of 
a random variable. Hierarchical models are briefly discussed. As in the previous 
edition, Chapter 6 introduces the concept of the support of a density and emphasizes 
that a transformation method can be used when the transformation is monotone on the 
region of support. The Jacobian method is included for implementation of a bivariate 
transformation. 

In Chapter 7, the applet Comparison of Student's t and Normal Distributions per- 
mits visualization of similarities and differences in t and standard normal density func- 
tions, and the applets Chi-Square Probabilities and Quantiles, Student's t Probabili- 
ties and Quantiles, and F-Ratio Probabilities and Quantiles provide probabilites and 
quantiles associated with the respective distributions, all with user-specified degrees 
of freedom. The applet DiceSample uses the familiar die-tossing example to intro- 
duce the concept of a sampling distribution. The results for different sample sizes 
permit the user to assess the impact of sample size on the sampling distribution of the 
sample mean. The applet also permits visualization of how the sampling distribution 
is affected if the die is not balanced. Under the general heading of "Sampling Dis- 
tributions and the Central Limit Theorem," four different applets illustrate different 
concepts: 


* Basic illustrates that, when sampling from a normally distributed population, 
the sample mean is itself normally distributed. 

e SampleSize exhibits the effect of the sample size on the sampling distribution of 
the sample mean. The sampling distribution for two (user-selected) sample sizes 
are simultaneously generated and displayed side by side. Similarities and differ- 
ences of the sampling distributions become apparent. Samples can be generated 
from populations with “normal,” uniform, U-shaped, and skewed distributions. 
The associated approximating normal sampling distributions can be overlayed 
on the resulting simulated distributions, permitting immediate visual assessment 
of the quality of the normal approximation (see Figure 3). 

* Variance simulates the sampling distribution of the sample variance when sam- 
pling from a population with a “normal” distribution. The theoretical (propor- 
tional to that of a x? random variable) distribution can be overlayed with the 
click of a button, again providing visual confirmation that theory really works. 

* VarianceSize allows a comparison of the effect of the sample size on the distri- 
bution of the sample variance (again, sampling from a normal population). The 
associated theoretical density can be overlayed to see that the theory actually 


FIGURE 3 
Applet illustration of 
the central limit 
theorem. 
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works. In addition, it is seen that for large sample sizes the sample variance has 
an approximate normal distribution. 


The applet Normal Approximation to the Binomial permits the user to assess the quality 
of the the (continuous) normal approximation for (discrete) binomial probabilities. 
As in previous chapters, a sequence of Applet Exercises leads the user to discover 
important and interesting answers and concepts. From a more theoretical perspective, 
we establish the independence of the sample mean and sample variance for a sample 
of size 2 from a normal distribution. As before, the proof of this result for general 
n is contained in an optional exercise. Exercises provide step-by-step derivations of 
the mean and variance for random variables with ¢ and F distributions. 

Throughout Chapter 8, we have stressed the assumptions associated with confi- 
dence intervals based on the г distributions. We have also included a brief discussion 
of the robustness of the ¢ procedures and the lack of such for the intervals based 
on the x? and F distributions. The applet ConfidenceIntervalP illustrates properties 
of large-sample confidence intervals for a population proportion. In Chapter 9, the 
applets PointSingle, PointbyPoint, and PointEstimation ultimately lead to a very nice 
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illustration of convergence in probability. In Chapter 10, the applet Hypothesis Testing 
(for Proportions) illustrates important concepts associated with test of hypotheses 
including the following: 


* What does o really mean? 

* Tests based on larger-sample sizes typically have smaller probabilities of type 
II errors if the level of the tests stays fixed. 

* Fora fixed sample size, the power function increases as the value of the parameter 
moves further from the values specified by the null hypothesis. 


Once users visualize these concepts, the subsequent theoretical developments are 
more relevant and meaningful. Applets for the x?, t, F distributions are used to 
obtain exact p-values for associated tests of hypotheses. We also illustrate explicitly 
that the power of a uniformly most powerful test can be smaller (although the largest 
possible) than desired. 

In Chapter 11, the simple linear regression model is thoroughly discussed (including 
confidence intervals, prediction intervals, and correlation) before the matrix approach 
to multiple linear regression model is introduced. The applets Fitting a Line Using 
Least Squares and Removing Points from Regression illustrate what the least-squares 
criterion accomplishes and that a few unusual data points can have considerable 
impact on the fitted regression line. The coefficients of determination and multiple 
determination are introduced, discussed, and related to the relevant ѓ and F statistics. 
Exercises demonstrate that high (low) coefficients of (multiple) determination values 
do not necessarily correspond to statistically significant (insignificant) results. 

Chapter 12 includes a separate section on the matched-pairs experiment. Although 
many possible sets of dummy variables can be used to cast the analysis of variance 
into a regression context, in Chapter 13 we focus on the dummy variables typically 
used by SAS and other statistical analysis computing packages. The text still focuses 
primarily on the randomized block design with fixed (nonrandom) block effects. If 
an instructor wishes, a series of supplemental exercises dealing with the randomized 
block design with random block effects can be used to illustrate the similarities and 
differences of these two versions of the randomized block design. 

The new Chapter 16 provides a brief introduction to Bayesian methods of statistical 
inference. The chapter focuses on using the data and the prior distribution to obtain 
the posterior and using the posterior to produce estimates, credible intervals, and hy- 
pothesis tests for parameters. The applet Binomial Revision facilitates understanding 
of the process by which data are used to update the prior and obtain the posterior. 
Many of the posterior distributions are beta or gamma distributions, and previously 
discussed applets are instrumental in obtaining credible intervals or computing the 
probability of various hypotheses. 


The Exercises 


This edition contains more than 350 new exercises. Many of the new exercises use the 
applets previously mentioned to guide the user through a series of steps that lead to 
more thorough understanding of important concepts. Others use the applets to provide 
confidence intervals or p-values that could only be approximated by using tables in the 
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appendix. As in previous editions, some of the new exercises are theoretical whereas 
others contain data from documented sources that deal with research in a variety of 
fields. We continue to believe that exercises based on real data or actual experimental 
scenarios permit students to see the practical uses of the various statistical and proba- 
bilistic methods presented in the text. As they work through these exercises, students 
gain insight into the real-life applications of the theoretical results developed in the 
text. This insight makes learning the necessary theory more enjoyable and produces 
a deeper understanding of the theoretical methods. As in previous editions, the more 
challenging exercises are marked with an asterisk (*). Answers to the odd-numbered 
exercises are provided in the back of the book. 


Tables and Appendices 


We have maintained the use of the upper-tail normal tables because the users of the 
text find them to be more convenient. We have also maintained the format of the table 
of the F distributions that we introduced in previous editions. This table of the F 
distributions provides critical values corresponding to upper-tail areas of .100, .050, 
.025, .010, and .005 in a single table. Because tests based on statistics possessing 
the F distribution occur quite often, this table facilitates the computation of attained 
significance levels, or p-values, associated with observed values of these statistics. 

We have also maintained our practice of providing easy access to often-used 
information. Because the normal and т tables are the most frequently used statis- 
tical tables in the text, copies of these tables are given in Appendix 3 and inside the 
front cover of the text. Users of previous editions have often remarked favorably about 
the utility of tables of the common probability distributions, means, variances, and 
moment-generating functions provided in Appendix 2 and inside the back cover of 
the text. In addition, we have included some frequently used mathematical results in a 
supplement to Appendix 1. These results include the binomial expansion of (x + у)”, 
the series expansion of e*, sums of geometric series, definitions of the gamma and 
beta functions, and so on. As before, each chapter begins with an outline containing 
the titles of the major sections in that chapter. 
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NOTE IO THE STUDENT 


As the title Mathematical Statistics with Applications implies, this text is concerned 
with statistics, in both theory and application, and only deals with mathematics as a 
necessary tool to give you a firm understanding of statistical techniques. The following 
suggestions for using the text will increase your learning and save your time. 

The connectivity of the book is provided by the introductions and summaries in 
each chapter. These sections explain how each chapter fits into the overall picture of 
statistical inference and how each chapter relates to the preceding ones. 
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xxi 


xxii — Note to the Student 


Within the chapters, important concepts are set off as definitions. These should be 
read and reread until they are clearly understood because they form the framework 
on which everything else is built. The main theoretical results are set off as theo- 
rems. Although it is not necessary to understand the proof of each theorem, a clear 
understanding of the meaning and implications of the theorems is essential. 

It is also essential that you work many of the exercises—for at least four reasons: 


* Youcan be certain that you understand what you have read only by putting your 
knowledge to the test of working problems. 

* Many of the exercises are of a practical nature and shed light on the applications 
of probability and statistics. 

€ Some of the exercises present new concepts and thus extend the material covered 
in the chapter. 

* Many of the applet exercises help build intuition, facilitate understanding of 
concepts, and provide answers that cannot (practically) be obtained using tables 
in the appendices (see Figure 4). 


D. D. W. 
W. M. 
R. L. S. 
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Introduction 


Statistical techniques are employed in almost every phase of life. Surveys are de- 
signed to collect early returns on election day and forecast the outcome of an election. 
Consumers are sampled to provide information for predicting product preferences. 
Research physicians conduct experiments to determine the effect of various drugs and 
controlled environmental conditions on humans in order to infer the appropriate treat- 
ment for various illnesses. Engineers sample a product quality characteristic and var- 
ious controllable process variables to identify key variables related to product quality. 
Newly manufactured electronic devices are sampled before shipping to decide whether 
to ship or hold individual lots. Economists observe various indices of economic health 
over a period of time and use the information to forecast the condition of the economy 
in the future. Statistical techniques play an important role in achieving the objective 
of each of these practical situations. The development of the theory underlying these 
techniques is the focus of this text. 

A prerequisite to a discussion of the theory of statistics is a definition of statis- 
tics and a statement of its objectives. Webster’s New Collegiate Dictionary defines 
statistics as “а branch of mathematics dealing with the collection, analysis, interpre- 
tation, and presentation of masses of numerical data.” Stuart and Ord (1991) state: 
“Statistics is the branch of the scientific method which deals with the data obtained by 
counting or measuring the properties of populations.” Rice (1995), commenting on 
experimentation and statistical applications, states that statistics is “essentially con- 
cerned with procedures for analyzing data, especially data that in some vague sense 
have a random character.” Freund and Walpole (1987), among others, view statistics 
as encompassing “the science of basing inferences on observed data and the entire 
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problem of making decisions in the face of uncertainty.’ And Mood, Graybill, and 
Boes (1974) define statistics as “the technology of the scientific method” and add 
that statistics is concerned with “(1) the design of experiments and investigations, 
(2) statistical inference.” A superficial examination of these definitions suggests a 
substantial lack of agreement, but all possess common elements. Each description 
implies that data are collected, with inference as the objective. Each requires select- 
ing a subset of a large collection of data, either existent or conceptual, in order to 
infer the characteristics of the complete set. All the authors imply that statistics is a 
theory of information, with inference making as its objective. 

The large body of data that is the target of our interest is called the population, and 
the subset selected from it is a sample. The preferences of voters for a gubernatorial 
candidate, Jones, expressed in quantitative form (1 for “prefer” and O for “do not 
prefer") provide a real, finite, and existing population of great interest to Jones. To 
determine the true fraction who favor his election, Jones would need to interview 
all eligible voters—a task that is practically impossible. The voltage at a particular 
point in the guidance system for a spacecraft may be tested in the only three sys- 
tems that have been built. The resulting data could be used to estimate the voltage 
characteristics for other systems that might be manufactured some time in the future. 
In this case, the population is conceptual. We think of the sample of three as being 
representative of a large population of guidance systems that could be built using the 
same method. Presumably, this population would possess characteristics similar to 
the three systems in the sample. Analogously, measurements on patients in a medical 
experiment represent a sample from a conceptual population consisting of all patients 
similarly afflicted today, as well as those who will be afflicted in the near future. You 
will find it useful to clearly define the populations of interest for each of the scenarios 
described earlier in this section and to clarify the inferential objective for each. 

It is interesting to note that billions of dollars are spent each year by U.S. indus- 
try and government for data from experimentation, sample surveys, and other data 
collection procedures. This money is expended solely to obtain information about 
phenomena susceptible to measurement in areas of business, science, or the arts. The 
implications of this statement provide keys to the nature of the very valuable contri- 
bution that the discipline of statistics makes to research and development in all areas 
of society. Information useful in inferring some characteristic of a population (either 
existing or conceptual) is purchased in a specified quantity and results in an inference 
(estimation or decision) with an associated degree of goodness. For example, if Jones 
arranges for a sample of voters to be interviewed, the information in the sample can be 
used to estimate the true fraction of all voters who favor Jones's election. In addition 
to the estimate itself, Jones should also be concerned with the likelihood (chance) 
that the estimate provided is close to the true fraction of eligible voters who favor his 
election. Intuitively, the larger the number of eligible voters in the sample, the higher 
will be the likelihood of an accurate estimate. Similarly, if a decision is made regarding 
the relative merits of two manufacturing processes based on examination of samples 
of products from both processes, we should be interested in the decision regarding 
which is better and the likelihood that the decision is correct. In general, the study of 
statistics is concerned with the design of experiments or sample surveys to obtain a 
specified quantity of information at minimum cost and the optimum use of this infor- 
mation in making an inference about a population. The objective of statistics is to make 
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an inference about a population based on information contained in a sample from 
that population and to provide an associated measure of goodness for the inference. 


Exercises 


For each of the following situations, identify the population of interest, the inferential objective, 
and how you might go about collecting a sample. 


a The National Highway Safety Council wants to estimate the proportion of automobile tires 
with unsafe tread among all tires manufactured by a specific company during the current 
production year. 

b A political scientist wants to determine whether a majority of adult residents of a state favor 
a unicameral legislature. 

C A medical scientist wants to estimate the average length of time until the recurrence of a 
certain disease. 

d Anelectrical engineer wants to determine whether the average length of life of transistors 
of a certain type is greater than 500 hours. 

e A university researcher wants to estimate the proportion of U.S. citizens from 
"Generation X" who are interested in starting their own businesses. 

Í For more than a century, normal body temperature for humans has been accepted to be 
98.6° Fahrenheit. Is it really? Researchers want to estimate the average temperature of 
healthy adults in the United States. 

g Acity engineer wants to estimate the average weekly water consumption for single-family 
dwelling units in the city. 


Characterizing a Set of Measurements: 
Graphical Methods 


In the broadest sense, making an inference implies partially or completely describing 
a phenomenon or physical object. Little difficulty is encountered when appropriate 
and meaningful descriptive measures are available, but this is not always the case. 
For example, we might characterize a person by using height, weight, color of hair 
and eyes, and other descriptive measures of the person's physiognomy. Identifying a 
set of descriptive measures to characterize an oil painting would be a comparatively 
more difficult task. Characterizing a population that consists of a set of measurements 
is equally challenging. Consequently, a necessary prelude to a discussion of inference 
making is the acquisition of a method for characterizing a set of numbers. The charac- 
terizations must be meaningful so that knowledge of the descriptive measures enables 
us to clearly visualize the set of numbers. In addition, we require that the characteriza- 
tions possess practical significance so that knowledge of the descriptive measures for 
a population can be used to solve a practical, nonstatistical problem. We will develop 
our ideas on this subject by examining a process that generates a population. 
Consider a study to determine important variables affecting profit in a business that 
manufactures custom-made machined devices. Some of these variables might be the 
dollar size of the contract, the type of industry with which the contract is negotiated, 
the degree of competition in acquiring contracts, the salesperson who estimates the 
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FIGURE 1.1 
Relative frequency 
histogram 


contract, fixed dollar costs, and the supervisor who is assigned the task of organizing 
and conducting the manufacturing operation. The statistician will wish to measure the 
response or dependent variable, profit per contract, for several jobs (the sample). Along 
with recording the profit, the statistician will obtain measurements on the variables 
that might be related to profit—the independent variables. His or her objective is to 
use information in the sample to infer the approximate relationship of the independent 
variables just described to the dependent variable, profit, and to measure the strength 
of this relationship. The manufacturer's objective is to determine optimum conditions 
for maximizing profit. 

The population of interest in the manufacturing problem is conceptual and consists 
of all measurements of profit (per unit of capital and labor invested) that might be 
made on contracts, now and in the future, for fixed values of the independent variables 
(size of the contract, measure of competition, etc.). The profit measurements will vary 
from contract to contract in a seemingly random manner as a result of variations in 
materials, time needed to complete individual segments of the work, and other uncon- 
trollable variables affecting the job. Consequently, we view the population as being 
represented by a distribution of profit measurements, with the form of the distribution 
depending on specific values of the independent variables. Our wish to determine the 
relationship between the dependent variable, profit, and a set of independent variables 
is therefore translated into a desire to determine the effect of the independent variables 
on the conceptual distribution of population measurements. 

An individual population (or any set of measurements) can be characterized by 
a relative frequency distribution, which can be represented by a relative frequency 
histogram. А graph is constructed by subdividing the axis of measurement into inter- 
vals of equal width. A rectangle is constructed over each interval, such that the height 
of the rectangle is proportional to the fraction of the total number of measurements 
falling in each cell. For example, to characterize the ten measurements 2.1, 2.4, 2.2, 
2.3, 2.7, 2.5, 2.4, 2.6, 2.6, and 2.9, we could divide the axis of measurement into in- 
tervals of equal width (say, .2 unit), commencing with 2.05. The relative frequencies 
(fraction of total number of measurements), calculated for each interval, are shown 
in Figure 1.1. Notice that the figure gives a clear pictorial description of the entire set 
of ten measurements. 

Observe that we have not given precise rules for selecting the number, widths, 
or locations of the intervals used in constructing a histogram. This is because the 
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selection of these items is somewhat at the discretion of the person who is involved 
in the construction. 

Although they are arbitrary, a few guidelines can be very helpful in selecting the 
intervals. Points of subdivision of the axis of measurement should be chosen so that it is 
impossible for a measurement to fall on a point of division. This eliminates a source of 
confusion and is easily accomplished, as indicated in Figure 1.1. The second guideline 
involves the width of each interval and consequently, the minimum number of intervals 
needed to describe the data. Generally speaking, we wish to obtain information on the 
form of the distribution of the data. Many times the form will be mound-shaped, as 
illustrated in Figure 1.2. (Others prefer to refer to distributions such as these as bell- 
shaped, or normal.) Using many intervals with a small amount of data results in little 
summarization and presents a picture very similar to the data in their original form. 
Thelarger the amount of data, the greater the number of included intervals can be while 
still presenting a satisfactory picture of the data. We suggest spanning the range of the 
data with from 5 to 20 intervals and using the larger number of intervals for larger 
quantities of data. In most real-life applications, computer software (Minitab, SAS, 
К, S+, JMP, etc.) is used to obtain any desired histograms. These computer packages 
all produce histograms satisfying widely agreed-upon constraints on scaling, number 
of intervals used, widths of intervals, and the like. 

Some people feel that the description of data is an end in itself. Histograms are 
often used for this purpose, but there are many other graphical methods that provide 
meaningful summaries of the information contained in a set of data. Some excellent 
references for the general topic of graphical descriptive methods are given in the 
references at the end of this chapter. Keep in mind, however, that the usual objective 
of statistics is to make inferences. The relative frequency distribution associated with a 
data set and the accompanying histogram are sufficient for our objectives in developing 
the material in this text. This is primarily due to the probabilistic interpretation that 
can be derived from the frequency histogram, Figure 1.1. We have already stated that 
the area of a rectangle over a given interval is proportional to the fraction of the total 
number of measurements falling in that interval. Let's extend this idea one step further. 

If a measurement is selected at random from the original data set, the probability 
that it will fall in a given interval is proportional to the area under the histogram lying 
over that interval. (At this point, we rely on the layperson's concept of probability. 
This term is discussed in greater detail in Chapter 2.) For example, for the data used 
to construct Figure 1.1, the probability that a randomly selected measurement falls in 
the interval from 2.05 to 2.45 is .5 because half the measurements fall in this interval. 
Correspondingly, the area under the histogram in Figure 1.1 over the interval from 
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2.05 to 2.45 is half of the total area under the histogram. It is clear that this interpreta- 
tion applies to the distribution of any set of measurements—a population or a sample. 
Suppose that Figure 1.2 gives the relative frequency distribution of profit (in mil- 
lions of dollars) for a conceptual population of profit responses for contracts at spec- 
ified settings of the independent variables (size of contract, measure of competition, 
etc.). The probability that the next contract (at the same settings of the independent 
variables) yields a profit that falls in the interval from 2.05 to 2.45 million is given by 
the proportion of the area under the distribution curve that is shaded in Figure 1.2. 


Exercises 


Are some cities more windy than others? Does Chicago deserve to be nicknamed “The Windy 
City"? Given below are the average wind speeds (in miles per hour) for 45 selected U.S. cities: 


89 124 86 113 92 8.8 351 62 70 
7.1 11.8 10.7 7.6 9.1 9.2 82 90 87 
9.1 10.9 103 9.6 7.8 115 9.3 79 88 
8.8 127 8.4 7.8 5.7 105 105 96 89 
102 10.3 7.1 10.66 8.3 8.8 95 88 9.4 


Source: The World Almanac and Book of Facts, 2004. 


a Construct a relative frequency histogram for these data. (Choose the class boundaries 
without including the value 35.1 in the range of values.) 

b The value 35.1 was recorded at Mt. Washington, New Hampshire. Does the geography of 
that city explain the magnitude of its average wind speed? 

с The average wind speed for Chicago is 10.3 miles per hour. What percentage of the cities 
have average wind speeds in excess of Chicago's? 


d Do you think that Chicago is unusually windy? 


Of great importance to residents of central Florida is the amount of radioactive material present 
in the soil of reclaimed phosphate mining areas. Measurements of the amount of 2380 in 25 soil 
samples were as follows (measurements in picocuries per gram): 


Л4& 6.47 1.90 2.69 75 
32 9.99 1.77 241 1.96 
1.66 70 242 .54 3.36 
3.59 37° 1.00 8.32 4.06 
4.55 76 2.03 5.70 12.48 


Construct a relative frequency histogram for these data. 


The top 40 stocks on the over-the-counter (OTC) market, ranked by percentage of outstanding 
shares traded on one day last year are as follows: 


11.88 627 549 481 440 3.78 344 3.11 2.88 2.68 
7.99 6.07 5.26 4.79 4.05 3.69 3.36 3.03 2.74 2.63 
715 5.98 5.07 455 3.94 3.62 326 2.99 2.74 2.62 
7.13 5.91 494 443 3.93 348 3.20 2.89 2.69 2.61 


a Construct a relative frequency histogram to describe these data. 


b What proportion of these top 40 stocks traded more than 4% of the outstanding shares? 


Exercises 7 


с If one of the stocks is selected at random from the 40 for which the preceding data were 
taken, what is the probability that it will have traded fewer than 5% of its outstanding shares? 


1.5 Given here is the relative frequency histogram associated with grade point averages (GPAs) of 
a sample of 30 students: 


Relative L 
Frequency 
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1.85 205 2.25 245 2.65 2.85 3.05 3.25 345 
Grade Point 
Average 


a Which of the GPA categories identified on the horizontal axis are associated with the largest 
proportion of students? 
b What proportion of students had GPAs in each of the categories that you identified? 
What proportion of the students had GPAs less than 2.65? 
1.6  Therelative frequency histogram given next was constructed from data obtained from a random 


sample of 25 families. Each was asked the number of quarts of milk that had been purchased 
the previous week. 


Relative | 

Frequency ^ - 
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a Use this relative frequency histogram to determine the number of quarts of milk purchased 
by the largest proportion of the 25 families. The category associated with the largest relative 
frequency is called the modal category. 


b What proportion of the 25 families purchased more than 2 quarts of milk? 


What proportion purchased more than 0 but fewer than 5 quarts? 
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1.7 The self-reported heights of 105 students in a biostatistics class were used to construct the 


histogram given below. 


Relative 10/105 [— 
frequency 


5/105 - 


60 63 66 69 72 75 
Heights 


a Describe the shape of the histogram. 
Does this histogram have an unusual feature? 


Can you think of an explanation for the two peaks in the histogram? Is there some consid- 
eration other than height that results in the two separate peaks? What is it? 


1.8 An article in Archaeometry presented an analysis of 26 samples of Romano-British pottery, 


1.3 


found at four different kiln sites in the United Kingdom. The percentage of aluminum oxide in 
each of the 26 samples is given below: 


Llanederyn Caldicot Island Thorns Ashley Rails 


14.4 11.6 11.8 18.3 17.7 
13.8 111 11.6 15.8 18.3 
14.6 13.4 18.0 16.7 
11.5 124 18.0 14.8 
13.8 13.1 20.8 19.1 
10.9 12.7 

10.1 12.5 


Source: A. Tubb, A. J. Parker, апа С. Nickless, “Тһе Analysis of Romano-British Pottery by Atomic 
Absorption Spectrophotometry,” Archaeometry 22 (1980): 153. 


a Construct a relative frequency histogram to describe the aluminum oxide content of all 
26 pottery samples. 


b What unusual feature do you see in this histogram? Looking at the data, can you think of 
an explanation for this unusual feature? 


Characterizing a Set of Measurements: 
Numerical Methods 


The relative frequency histograms presented in Section 1.2 provide useful informa- 
tion regarding the distribution of sets of measurement, but histograms are usually 
not adequate for the purpose of making inferences. Indeed, many similar histograms 
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could be formed from the same set of measurements. To make inferences about a 
population based on information contained in a sample and to measure the goodness 
of the inferences, we need rigorously defined quantities for summarizing the infor- 
mation contained in a sample. These sample quantities typically have mathematical 
properties, to be developed in the following chapters, that allow us to make probability 
statements regarding the goodness of our inferences. 

The quantities we define are numerical descriptive measures of a set of data. 
We seek some numbers that have meaningful interpretations and that can be used 
to describe the frequency distribution for any set of measurements. We will confine 
our attention to two types of descriptive numbers: measures of central tendency and 
measures of dispersion or variation. 

Probably the most common measure of central tendency used in statistics is the 
arithmetic mean. (Because this is the only type of mean discussed in this text, we will 
omit the word arithmetic.) 


The mean of a sample of n measured responses y1, y2,..., Yn is given by 


The corresponding population mean is denoted ju. 


The symbol y, read “у bar,” refers to a sample mean. We usually cannot measure 
the value of the population mean, m; rather, ш is an unknown constant that we may 
want to estimate using sample information. 

The mean of a set of measurements only locates the center of the distribution 
of data; by itself, it does not provide an adequate description of a set of measure- 
ments. Two sets of measurements could have widely different frequency distributions 
but equal means, as pictured in Figure 1.3. The difference between distributions I 
and II in the figure lies in the variation or dispersion of measurements on either 
side of the mean. To describe data adequately, we must also define measures of data 
variability. 

The most common measure of variability used in statistics is the variance, whichis a 
function of the deviations (or distances) of the sample measurements from their mean. 
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The variance of a sample of measurements y1, y2,..., y, is the sum of the 
square of the differences between the measurements and their mean, divided 
by n — 1. Symbolically, the sample variance is 


1 n 
s e nd 5 2 
5 zo E 


The corresponding population variance is denoted by the symbol c?. 


Notice that we divided by п — 1 instead of by n in our definition of 52. The 
theoretical reason for this choice of divisor is provided in Chapter 8, where we will 
show that s? defined this way provides a "better" estimator for the true population 
variance, o?. Nevertheless, it is useful to think of 52 as “almost” the average of the 
squared deviations of the observed values from their mean. The larger the variance of 
a set of measurements, the greater will be the amount of variation within the set. The 
variance is of value in comparing the relative variation of two sets of measurements, 
but it gives information about the variation in a single set only when interpreted in 
terms of the standard deviation. 


The standard deviation of a sample of measurements is the positive square root 
of the variance; that 15, 


в = vs. 


The corresponding population standard deviation is denoted by o = Vo?. 


Although it is closely related to the variance, the standard deviation can be used to 
give a fairly accurate picture of data variation for a single set of measurements. It can be 
interpreted using Tchebysheff’s theorem (which is discussed in Exercise 1.32 and will 
be presented formally in Chapter 3) and by the empirical rule (which we now explain). 

Many distributions of data in real life are mound-shaped; that is, they can be 
approximated by a bell-shaped frequency distribution known as a normal curve. 
Data possessing mound-shaped distributions have definite characteristics of varia- 
tion, as expressed in the following statement. 


Empirical Rule 


For a distribution of measurements that is approximately normal (bell shaped), 
it follows that the interval with end points 


u + с contains approximately 68% of the measurements. 
u + 20 contains approximately 95% of the measurements. 


u + Зо contains almost all of the measurements. 


FIGURE 1.4 
Normal curve 
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As was mentioned in Section 1.2, once the frequency distribution of a set of mea- 
surements is known, probability statements regarding the measurements can be made. 
These probabilities were shown as areas under a frequency histogram. Analogously, 
the probabilities specified in the empirical rule are areas under the normal curve shown 
in Figure 1.4. 

Use of the empirical rule is illustrated by the following example. Suppose that the 
scores on an achievement test given to all high school seniors in a state are known to 
have, approximately, a normal distribution with mean u = 64 and standard deviation 
o = 10. It can then be deduced that approximately 68% of the scores are between 54 
and 74, that approximately 95% of the scores are between 44 and 84, and that almost 
all of the scores are between 34 and 94. Thus, knowledge of the mean and the standard 
deviation gives us a fairly good picture of the frequency distribution of scores. 

Suppose that a single high school student is randomly selected from those who took 
the test. What is the probability that his score will be between 54 and 74? Based on the 
empirical rule, we find that 0.68 is a reasonable answer to this probability question. 

The utility and value of the empirical rule are due to the common occurrence 
of approximately normal distributions of data in nature—more so because the rule 
applies to distributions that are not exactly normal but just mound-shaped. You will 
find that approximately 95% of a set of measurements will be within 2o of и for a 
variety of distributions. 


Exercises 


Resting breathing rates for college-age students are approximately normally distributed with 
mean 12 and standard deviation 2.3 breaths per minute. What fraction of all college-age students 
have breathing rates in the following intervals? 


9.7 to 14.3 breaths per minute 
7.4 to 16.6 breaths per minute 
9.7 to 16.6 breaths per minute 


aa cC» 


Less than 5.1 or more than 18.9 breaths per minute 


It has been projected that the average and standard deviation of the amount of time spent online 
using the Internet are, respectively, 14 and 17 hours per person per year (many do not use 
the Internet at all!). 


a What value is exactly 1 standard deviation below the mean? 

b Ifthe amount of time spent online using the Internet is approximately normally distributed, 
what proportion of the users spend an amount of time online that is less than the value you 
found in part (a)? 
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с Is the amount of time spent online using the Internet approximately normally distributed? 
Why? 


The following results on summations will help us in calculating the sample variance s?. For 
any constant c, 


n 
a Ус = пс. 
ї=1 
п 


b Xoy; = 5» 
i=l 


і=1 


c »- + у) = Y + Y». 
i=l i=l i=l 


Use (a), (b), and (c) to show that 


2. 
ү з _ | з 
ете x "n-1 isi) 


Use the result of Exercise 1.11 to calculate s for the n = 6 sample measurements 1, 4, 2, 1, 3, 
and 3. 


N 


Refer to Exercise 1.2. 


a Calculate y and s for the data given. 

b Calculate the interval y + ks for k = 1, 2, and 3. Count the number of measurements that 
fall within each interval and compare this result with the number that you would expect 
according to the empirical rule. 


Refer to Exercise 1.3 and repeat parts (a) and (b) of Exercise 1.13. 
Refer to Exercise 1.4 and repeat parts (a) and (b) of Exercise 1.13. 


In Exercise 1.4, there is one extremely large value (11.88). Eliminate this value and calculate 
y and s for the remaining 39 observations. Also, calculate the intervals y + ks for k = 1, 
2, and 3; count the number of measurements in each; then compare these results with those 
predicted by the empirical rule. Compare the answers here to those found in Exercise 1.15. 
Note the effect of a single large observation on y and s. 


The range of a set of measurements is the difference between the largest and the smallest values. 
The empirical rule suggests that the standard deviation of a set of measurements may be roughly 
approximated by one-fourth of the range (that is, range/4). Calculate this approximation to s 
for the data sets in Exercises 1.2, 1.3, and 1.4. Compare the result in each case to the actual, 
calculated value of s. 


The College Board’s verbal and mathematics Scholastic Aptitude Tests are scored on а scale of 
200 to 800. It seems reasonable to assume that the distribution of test scores are approximately 
normally distributed for both tests. Use the result from Exercise 1.17 to approximate the standard 
deviation for scores on the verbal test. 


According to the Environmental Protection Agency, chloroform, which in its gaseous form 
is suspected to be a cancer-causing agent, is present in small quantities in all the country’s 
240,000 public water sources. If the mean and standard deviation of the amounts of chloroform 
present in water sources are 34 and 53 micrograms per liter (ug/L), respectively, explain why 
chloroform amounts do not have a normal distribution. 
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Weekly maintenance costs for a factory, recorded over a long period of time and adjusted 
for inflation, tend to have an approximately normal distribution with an average of $420 and a 
standard deviation of $30. If $450 is budgeted for next week, what is an approximate probability 
that this budgeted figure will be exceeded? 


The manufacturer of a new food additive for beef cattle claims that 80% of the animals fed a 
diet including this additive should have monthly weight gains in excess of 20 pounds. A large 
sample of measurements on weight gains for cattle fed this diet exhibits an approximately 
normal distribution with mean 22 pounds and standard deviation 2 pounds. Do you think the 
sample information contradicts the manufacturer’s claim? (Calculate the probability of a weight 
gain exceeding 20 pounds.) 


How Inferences Are Made 


The mechanism instrumental in making inferences can be well illustrated by analyzing 
our own intuitive inference-making procedures. 

Suppose that two candidates are running for a public office in our community 
and that we wish to determine whether our candidate, Jones, is favored to win. The 
population of interest is the set of responses from all eligible voters who will vote on 
election day, and we wish to determine whether the fraction favoring Jones exceeds .5. 
For the sake of simplicity, suppose that all eligible voters will go to the polls and that 
we randomly select a sample of 20 from the courthouse roster of voters. All 20 are 
contacted and all favor Jones. What do you conclude about Jones’s prospects for 
winning the election? 

There is little doubt that most of us would immediately infer that Jones will win. 
This is an easy inference to make, but this inference itself is not our immediate goal. 
Rather, we wish to examine the mental processes that were employed in reaching this 
conclusion about the prospective behavior of a large voting population based on a 
sample of only 20 people. 

Winning means acquiring more than 50% of the votes. Did we conclude that Jones 
would win because we thought that the fraction favoring Jones in the sample was 
identical to the fraction favoring Jones in the population? We know that this is prob- 
ably not true. A simple experiment will verify that the fraction in the sample favoring 
Jones need not be the same as the fraction of the population who favor him. If a bal- 
anced coin is tossed, it is intuitively obvious that the true proportion of times it will 
turn up heads is .5. Yet if we sample the outcomes for our coin by tossing it 20 times, 
the proportion of heads will vary from sample to sample; that is, on one occasion 
we might observe 12 heads out of 20 flips, for a sample proportion of 12/20 = .6. 
On another occasion, we might observe 8 heads out of 20 flips, for a sample pro- 
portion of 8/20 = .4. In fact, the sample proportion of heads could be 0, .05, .10, 
10. 

Did we conclude that Jones would win because it would be impossible for 20 out 
of 20 sample voters to favor him if in fact less than 50% of the electorate intended to 
vote for him? The answer to this question is certainly no, but it provides the key to 
our hidden line of logic. It is not impossible to draw 20 out of 20 favoring Jones when 
less than 50% of the electorate favor him, but it is highly improbable. As a result, we 
concluded that he would win. 
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This example illustrates the potent role played by probability in making inferences. 
Probabilists assume that they know the structure of the population of interest and use 
the theory of probability to compute the probability of obtaining a particular sample. 
Assuming that they know the structure of a population generated by random drawings 
of five cards from a standard deck, probabilists compute the probability that the draw 
will yield three aces and two kings. Statisticians use probability to make the trip in 
reverse—from the sample to the population. Observing five aces in a sample of five 
cards, they immediately infer that the deck (which generates the population) is loaded 
and not standard. The probability of drawing five aces from a standard deck is zero! 
This is an exaggerated case, but it makes the point. Basic to inference making is the 
problem of calculating the probability of an observed sample. As a result, probability 
is the mechanism used in making statistical inferences. 

One final comment is in order. If you did not think that the sample justified an 
inference that Jones would win, do not feel too chagrined. One can easily be misled 
when making intuitive evaluations of the probabilities of events. If you decided that 
the probability was very low that 20 voters out of 20 would favor Jones, assuming that 
Jones would lose, you were correct. However, it is not difficult to concoct an example 
in which an intuitive assessment of probability would be in error. Intuitive assessments 
of probabilities are unsatisfactory, and we need a rigorous theory of probability in 
order to develop methods of inference. 


Theory and Reality 


Theories are conjectures proposed to explain phenomena in the real world. As such, 
theories are approximations or models for reality. These models or explanations of 
reality are presented in verbal form in some less quantitative fields and as mathematical 
relationships in others. Whereas a theory of social change might be expressed verbally 
in sociology, a description of the motion of a vibrating string is presented in a precise 
mathematical manner in physics. When we choose a mathematical model for a phys- 
ical process, we hope that the model reflects faithfully, in mathematical terms, the 
attributes of the physical process. If so, the mathematical model can be used to arrive 
at conclusions about the process itself. If we could develop an equation to predict the 
position of a vibrating string, the quality of the prediction would depend on how well 
the equation fit the motion of the string. The process of finding a good equation is 
not necessarily simple and usually requires several simplifying assumptions (uniform 
string mass, no air resistance, etc.). The final criterion for deciding whether a model 
is “good” is whether it yields good and useful information. The motivation for using 
mathematical models lies primarily in their utility. 

This text is concerned with the theory of statistics and hence with models of reality. 
We will postulate theoretical frequency distributions for populations and will develop 
a theory of probability and inference in a precise mathematical manner. The net result 
will be a theoretical or mathematical model for acquiring and utilizing information 
in real life. The model will not be an exact representation of nature, but this should 
not disturb us. Its utility, like that of other theories, will be measured by its ability to 
assist us in understanding nature and in solving problems in the real world. 
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Summary 


The objective of statistics is to make an inference about a population based on infor- 
mation contained in a sample taken from that population. The theory of statistics is 
a theory of information concerned with quantifying information, designing experi- 
ments or procedures for data collection, and analyzing data. Our goal is to minimize 
the cost of a specified quantity of information and to use this information to make in- 
ferences. Most important, we have viewed making an inference about the unknown 
population as a two-step procedure. First, we enlist a suitable inferential procedure 
for the given situation. Second, we seek a measure of the goodness of the resulting 
inference. For example, every estimate of a population characteristic based on infor- 
mation contained in the sample might have associated with it a probabilistic bound 
on the error of estimation. 

A necessary prelude to making inferences about a population is the ability to de- 
scribe a set of numbers. Frequency distributions provide a graphic and useful method 
for characterizing conceptual or real populations of numbers. Numerical descriptive 
measures are often more useful when we wish to make an inference and measure the 
goodness of that inference. 

The mechanism for making inferences is provided by the theory of probability. The 
probabilist reasons from a known population to the outcome of a single experiment, 
the sample. In contrast, the statistician utilizes the theory of probability to calculate 
the probability of an observed sample and to infer from this the characteristics of an 
unknown population. Thus, probability is the foundation of the theory of statistics. 

Finally, we have noted the difference between theory and reality. In this text, we 
will study the mathematical theory of statistics, which is an idealization of nature. It 
is rigorous, mathematical, and subject to study in a vacuum completely isolated from 
the real world. Or it can be tied very closely to reality and can be useful in making 
inferences from data in all fields of science. In this text, we will be utilitarian. We will 
not regard statistics as a branch of mathematics but as an area of science concerned 
with developing a practical theory of information. We will consider statistics as a 
separate field, analogous to physics—not as a branch of mathematics but as a theory 
of information that utilizes mathematics heavily. 

Subsequent chapters will expand on the topics that we have encountered in this 
introduction. We will begin with a study of the mechanism employed in making 
inferences, the theory of probability. This theory provides theoretical models for 
generating experimental data and thereby provides the basis for our study of statistical 
inference. 
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Supplementary Exercises 


Prove that the sum of the deviations of a set of measurements about their mean is equal to zero; 
that is, 


n 


Уу ж—У)=0. 


i=l 
The mean duration of television commercials is 75 seconds with standard deviation 20 seconds. 
Assume that the durations are approximately normally distributed to answer the following. 


a What percentage of commercials last longer than 95 seconds? 
b What percentage of the commercials last between 35 and 115 seconds? 


c Would you expect commercial to last longer than 2 minutes? Why or why not? 


Aqua running has been suggested as a method of cardiovascular conditioning for injured 
athletes and others who desire a low-impact aerobics program. In a study to investigate the 
relationship between exercise cadence and heart rate,' the heart rates of 20 healthy volunteers 
were measured at a cadence of 48 cycles per minute (a cycle consisted of two steps). The data 
are as follows: 


87 109 79 80 96 095 90 92 96 98 
101 91 78 112 94 98 94 107 81 96 


а Use the range of the measurements to obtain an estimate of the standard deviation. 

b Construct a frequency histogram for the data. Use the histogram to obtain a visual approx- 
imation to y and s. 

c Calculate y and s. Compare these results with the calculation checks provided by parts (a) 
and (b). 

d Construct the intervals у + ks, k = 1,2, and 3, and count the number of measurements 
falling in each interval. Compare the fractions falling in the intervals with the fractions that 
you would expect according to the empirical rule. 


1. R. P. Wilder, D. Breenan, and D. E. Schotte,"A Standard Measure for Exercise Prescription for Aqua 
Running,” American Journal of Sports Medicine 21(1) (1993): 45. 
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The following data give the lengths of time to failure for n = 88 radio transmitter-receivers: 


16 224 16 80 96 536 400 80 
392 576 128 56 656 224 40 32 
358 384 256 246 328 464 448 716 
304 16 72 8 80 72 56 608 
108 194 136 224 80 16 424 264 
156 216 168 184 552 72 184 240 
438 120 308 32 272 152 328 480 

60 208 340 104 72 168 40 152 
360 232 40 112 112 288 168 352 

56 72 64 40 184 264 96 224 
168 168 114 280 152 208 160 176 


а Use the range to approximate s for ће n = 88 lengths of time to failure. 

b Construct a frequency histogram for the data. [Notice the tendency of the distribution to 
tail outward (skew) to the right. ] 

c Usea calculator (or computer) to calculate y and s. (Hand calculation is much too tedious 
for this exercise.) 


d Calculate the intervals y + ks, k = 1, 2, and 3, and count the number of measurements 
falling in each interval. Compare your results with the empirical rule results. Note that the 
empirical rule provides a rather good description of these data, even though the distribution 
is highly skewed. 


Compare the ratio of the range to s for the three sample sizes (n = 6, 20, and 88) for 
Exercises 1.12, 1.24, and 1.25. Note that the ratio tends to increase as the amount of data 
increases. The greater the amount of data, the greater will be their tendency to contain a few 
extreme values that will inflate the range and have relatively little effect on s. We ignored this 
phenomenon and suggested that you use 4 as the ratio for finding a guessed value of s in checking 
calculations. 


A set of 340 examination scores exhibiting a bell-shaped relative frequency distribution has a 
mean of y — 72 and a standard deviation of s — 8. Approximately how many of the scores 
would you expect to fall in the interval from 64 to 80? The interval from 56 to 88? 


The discharge of suspended solids from a phosphate mine is normally distributed with mean 
daily discharge 27 milligrams per liter (mg/L) and standard deviation 14 mg/L. In what pro- 
portion of the days will the daily discharge be less than 13 mg/L? 


A machine produces bearings with mean diameter 3.00 inches and standard deviation 0.01 inch. 
Bearings with diameters in excess of 3.02 inches or less than 2.98 inches will fail to meet quality 
specifications. 


a Approximately what fraction of this machine's production will fail to meet specifications? 


b What assumptions did you make concerning the distribution of bearing diameters in order 
to answer this question? 


Compared to their stay-at-home peers, women employed outside the home have higher levels 
of high-density lipoproteins (HDL), the “good” cholesterol associated with lower risk for heart 
attacks. A study of cholesterol levels in 2000 women, aged 25—64, living in Augsburg, Germany, 
was conducted by Ursula Haertel, Ulrigh Keil, and colleagues? at the GSF-Medis Institut in 


2. Science News 135 (June 1989): 389. 
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Munich. Of these 2000 women, the 48% who worked outside the home had HDL levels that were 
between 2.5 and 3.6 milligrams per deciliter (mg/dL) higher than the HDL levels of their stay- 
at-home counterparts. Suppose that the difference in HDL levels is normally distributed, with 
mean 0 (indicating no difference between the two groups of women) and standard deviation 
1.2 mg/dL. If you were to select an employed woman and a stay-at-home counterpart at 
random, what is the probability that the difference in their HDL levels would be between 1.2 
and 2.4? 


Over the past year, a fertilizer production process has shown an average daily yield of 60 tons 
with a variance in daily yields of 100. If the yield should fall to less than 40 tons tomorrow, 
should this result cause you to suspect an abnormality in the process? (Calculate the probability 
of obtaining less than 40 tons.) What assumptions did you make concerning the distribution of 
yields? 


Letk > 1. Show that, for any set of n measurements, the fraction included in the interval y — ks 
to y + ks is at least (1 — 1/52). [Hint: 


1 n 2 
Po [$o = »| 


In this expression, replace all deviations for which |y; — y| > ks with ks. Simplify.] This result 
is known as Tchebysheff’s theorem? 


A personnel manager for a certain industry has records of the number of employees absent 
per day. The average number absent is 5.5, and the standard deviation is 2.5. Because there 
are many days with zero, one, or two absent and only a few with more than ten absent, the 
frequency distribution is highly skewed. The manager wants to publish an interval in which at 
least 7596 of these values lie. Use the result in Exercise 1.32 to find such an interval. 


For the data discussed in Exercise 1.33, give an upper bound to the fraction of days when there 
are more than 13 absentees. 


A pharmaceutical company wants to know whether an experimental drug has an effect on 
systolic blood pressure. Fifteen randomly selected subjects were given the drug and, after 
sufficient time for the drug to have an impact, their systolic blood pressures were recorded. 
The data appear below: 


172 140 123 130 115 
148 108 129 137 161 
123 152 133 128 142 


a Approximate the value of s using the range approximation. 
Calculate the values of y and s for the 15 blood pressure readings. 
Use Tchebysheff's theorem (Exercise 1.32) to find values a and b such that at least 75% 
of the blood pressure measurements lie between a and b. 

d Did Tchebysheff's theorem work? That is, use the data to find the actual percent of blood 
pressure readings that are between the values a and b you found in part (c). Is this actual 
percentage greater than 7596? 


A random sample of 100 foxes was examined by a team of veterinarians to determine the preva- 
lence of a specific parasite. Counting the number of parasites of this specific type, the veteri- 
narians found that 69 foxes had no parasites of the type of interest, 17 had one parasite of the 
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type under study, and so on. A summary of their results is given in the following table: 


Number of Parasites | 0 1 2 3 4 5 6 7 8 
Number of Foxes | 69 17 6 3 1 2 1 0 1 


a Construct the relative frequency histogram for the number of parasites per fox. 


b Calculate y and s for the data given. 
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What fraction of the parasite counts falls within 2 standard deviations of the mean? Within 
3 standard deviations? Do your results agree with Tchebysheff's theorem (Exercise 1.32) 


and/or the empirical rule? 


Studies indicate that drinking water supplied by some old lead-lined city piping systems may 
contain harmful levels of lead. Based on data presented by Karalekas and colleagues,’ it appears 
that the distribution of lead content readings for individual water specimens has mean .033 mg/L 
and standard deviation .10 mg/L. Explain why it is obvious that the lead content readings are 


not normally distributed. 


In Exercise 1.19, the mean and standard deviation of the amount of chloroform present in water 
sources were given to be 34 and 53, respectively. You argued that the amounts of chloroform 
could therefore not be normally distributed. Use Tchebysheff's theorem (Exercise 1.32) to 


describe the distribution of chloroform amounts in water sources. 


4. P. C. Karalekas, Jr., C. К. Ryan, and F. B. Taylor, “Control of Lead, Copper and Iron Pipe Corrosion in 


Boston,” American Water Works Journal (February 1983): 92. 
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Introduction 


In everyday conversation, the term probability is a measure of one's belief in the 
occurrence of a future event. We accept this as a meaningful and practical interpreta- 
tion of probability but seek a clearer understanding of its context, how itis measured, 
and how it assists in making inferences. 

The concept of probability is necessary in work with physical, biological, or so- 
cial mechanisms that generate observations that cannot be predicted with certainty. 
For example, the blood pressure of a person at a given point in time cannot be pre- 
dicted with certainty, and we never know the exact load that a bridge will endure 
before collapsing into a river. Such random events cannot be predicted with certainty, 
but the relative frequency with which they occur in a long series of trials is often 
remarkably stable. Events possessing this property are called random, or stochastic, 
events. This stable long-term relative frequency provides an intuitively meaningful 
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measure of our belief in the occurrence of a random event if a future observation is 
to be made. It is impossible, for example, to predict with certainty the occurrence of 
heads on a single toss of a balanced coin, but we would be willing to state with a fair 
measure of confidence that the fraction of heads in a long series of trials would be 
very near .5. That this relative frequency is commonly used as a measure of belief in 
the outcome for a single toss is evident when we consider chance from a gambler's 
perspective. He risks money on the single toss of a coin, not a long series of tosses. 
The relative frequency of a head in a long series of tosses, which a gambler calls the 
probability of a head, gives him a measure of the chance of winning on a single toss. If 
the coin were unbalanced and gave 90% heads in a long series of tosses, the gambler 
would say that the probability of a head is .9, and he would be fairly confident in the 
occurrence of a head on a single toss of the coin. 

The preceding example possesses some realistic and practical analogies. In many 
respects all people are gamblers. The research physician gambles time and money 
on a research project, and she is concerned with her success on a single flip of this 
symbolic coin. Similarly, the investment of capital in a new manufacturing plant is 
a gamble that represents a single flip of a coin on which the entrepreneur has high 
hopes for success. The fraction of similar investments that are successful in a long 
series of trials is of interest to the entrepreneur only insofar as it provides a measure 
of belief in the successful outcome of a single individual investment. 

The relative frequency concept of probability, although intuitively meaningful, 
does not provide a rigorous definition of probability. Many other concepts of proba- 
bility have been proposed, including that of subjective probability, which allows the 
probability of an event to vary depending upon the person performing the evaluation. 
Nevertheless, for our purposes we accept an interpretation based on relative frequency 
as a meaningful measure of our belief in the occurrence of an event. Next, we will 
examine the link that probability provides between observation and inference. 


Probability and Inference 


The role that probability plays in making inferences will be discussed in detail after 
an adequate foundation has been laid for the theory of probability. At this point we 
will present an elementary treatment of this theory through an example and an appeal 
to your intuition. 

The example selected is similar to that presented in Section 1.4 but simpler and 
less practical. It was chosen because of the ease with which we can visualize the 
population and sample and because it provides an observation-producing mechanism 
for which a probabilistic model will be constructed in Section 2.3. 

Consider a gambler who wishes to make an inference concerning the balance 
of a die. The conceptual population of interest is the set of numbers that would be 
generated if the die were rolled over and over again, ad infinitum. If the die were 
perfectly balanced, one-sixth of the measurements in this population would be 1s, 
one-sixth, 2s, one-sixth, 3s, and so on. The corresponding frequency distribution is 
shown in Figure 2.1. 

Using the scientific method, the gambler proposes the hypothesis that the die is 
balanced, and he seeks observations from nature to contradict the theory, if false. 
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A sample of ten tosses is selected from the population by rolling the die ten times. AII 
ten tosses result in 1s. The gambler looks upon this output of nature with a jaundiced 
eye and concludes that his hypothesis is not in agreement with nature and hence that 
the die is not balanced. 

The reasoning employed by the gambler identifies the role that probability plays 
in making inferences. The gambler rejected his hypothesis (and concluded that the 
die is unbalanced) not because it is impossible to throw ten 1s in ten tosses of a 
balanced die but because it is highly improbable. His evaluation of the probability 
was most likely subjective. That is, the gambler may not have known how to calculate 
the probability of ten 1s in ten tosses, but he had an intuitive feeling that this event 
was highly unlikely if the die were balanced. The point to note is that his decision 
was based on the probability of the observed sample. 

The need for a theory of probability that will provide a rigorous method for finding a 
number (a probability) that will agree with the actual relative frequency of occurrence 
of an event in a long series of trials is apparent if we imagine a different result for the 
gambler's sample. Suppose, for example, that instead of ten 1s, he observed five Is 
along with two 2s, one 3, one 4, and one 6. Is this result so improbable that we should 
reject our hypothesis that the die is balanced and conclude that the die is loaded in 
favor of 1s? If we must rely solely on experience and intuition to make our evaluation, 
it is not so easy to decide whether the probability of five 1s in ten tosses is large or 
small. The probability of throwing four 15 in ten tosses would be even more difficult to 
guess. We will not deny that experimental results often are obviously inconsistent with 
a given hypothesis and lead to its rejection. However, many experimental outcomes 
fall in a gray area where we require a rigorous assessment of the probability of their 
occurrence. Indeed, it is not difficult to show that intuitive evaluations of probabilities 
often lead to answers that are substantially in error and result in incorrect inferences 
about the target population. For example, if there are 20 people in a room, most people 
would guess that it is very unlikely that there would be two or more persons with the 
same birthday. Yet, under certain reasonable assumptions, in Example 2.18 we will 
show that the probability of such an occurrence is larger than .4, a number that is 
surprisingly large to many. 

We need a theory of probability that will permit us to calculate the probability (or 
a quantity proportional to the probability) of observing specified outcomes, assuming 
that our hypothesized model is correct. This topic will be developed in detail in 
subsequent chapters. Our immediate goal is to present an introduction to the theory 
of probability, which provides the foundation for modern statistical inference. We will 
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begin by reviewing some set notation that will be used in constructing probabilistic 
models for experiments. 


A Review of Set Notation 


To proceed with an orderly development of probability theory, we need some basic 
concepts of set theory. We will use capital letters, A, B, С,..., to denote sets of 
points. If the elements in the set A are a), a», and аз, we will write 


А = (a1, а, аз). 


Let 5 denote the set of all elements under consideration; that is, S is the universal 
set. For any two sets A and B, we will say that A is a subset of B, or A is contained in 
B (denoted A C B), if every point in A is also in B. The null, or empty, set, denoted 
by Ø, is the set consisting of no points. Thus, Й is a subset of every set. 

Sets and relationships between sets can be conveniently portrayed by using Venn 
diagrams. The Venn diagram in Figure 2.2 shows two sets, A and B, in the universal 
set S. Set A is the set of all points inside the triangle; set B is the set of all points 
inside the circle. Note that in Figure 2.2, A С В. 

Consider now two arbitrary sets of points. The union of A and B, denoted by 
AU B,is the set of all points in A or B or both. That is, the union of A and B contains 
all points that are in at least one of the sets. The Venn diagram in Figure 2.3 shows 


S 
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FIGURE 2.4 
Venn diagram for AB 


FIGURE 2.5 
Venn diagram for A 


two sets A and B, where A is the set of points in the left-hand circle and B is the set 
of points in the right-hand circle. The set A U B is the shaded region consisting of 
all points inside either circle (or both). The key word for expressing the union of two 
sets is or (meaning A or B or both). 

The intersection of A and B, denoted by AN B or by AB, is the set of all points in 
both A and B. The Venn diagram of Figure 2.4 shows two sets А and B, with AN B 
consisting of the points in the shaded region where the two sets overlap. The key word 
for expressing intersections is and (meaning A and B simultaneously). 

If A is a subset of S, then the complement of A, denoted by A, is the set of points 
that are in S but not in A. Figure 2.5 is a Venn diagram illustrating that the shaded 
area in S but not in A is A. Note that AU A = 5. 

Two sets, A and B, are said to be disjoint, or mutually exclusive, if AN B = Ø. That 
is, mutually exclusive sets have no points in common. The Venn diagram in Figure 2.6 
illustrates two sets A and B that are mutually exclusive. Referring to Figure 2.5, it is 
easy to see that, for any set A, A and A are mutually exclusive. 

Consider the die-tossing problem of Section 2.2 and let 5 denote the set of all pos- 
sible numerical observations for a single toss of a die. That is, 5 = (1, 2, 3, 4, 5, 6}. 
Let A = {1, 2}, B = {1, 3}, and C = (2,4, 6}. Then AUB = (1,2, 3, ANB = {1}, 
and A = {3, 4, 5, 6). Also, note that B and C are mutually exclusive, whereas A and 
C are not. 


FIGURE 2.6 
Venn diagram for 
mutually exclusive 
sets Aand B 
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We will not attempt a thorough review of set algebra, but we mention four equalities 
of considerable importance. These are the distributive laws, given by 


АП(ВОС) = (АП В) О(АПС), 
АОЦО(ВПС) = (АО В) П (АОС), 


апа DeMorgan's laws: 


(ANB)=AUB and (AUB)=ANB. 


In the next section we will proceed with an elementary discussion of probability 
theory. 


Exercises 


Suppose a family contains two children of different ages, and we are interested in the gender 
of these children. Let F denote that a child is female and M that the child is male and let a 
pair such as F M denote that the older child is female and the younger is male. There are four 
points in the set S of possible observations: 


S 2 (FF, FM, MF, MM). 


Let A denote the subset of possibilities containing no males; B, the subset containing two 
males; and C, the subset containing at least one male. List the elements of A, B, C, AN B, 
АОВ, АПС, АЧС, ВПС, ВО С, апас B. 


Suppose that А апа В аге two events. Write expressions involving unions, intersections, and 
complements that describe the following: 

Both events occur. 

At least one occurs. 


Neither occurs. 


aa $9 


Exactly one occurs. 


Draw Venn diagrams to verify DeMorgan’s laws. That is, for any two sets A and B, (A U B) = 
AN B and (An B) = AU B. 
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If A and B are two sets, draw Venn diagrams to verify the following: 


a A=(ANB)U(ANB). 
b IfBCAthenA— BU(An B). 


Refer to Exercise 2.4. Use the identities A = A N S and 5 = B U B and a distributive law to 
prove that 


а A—(An B)U(An B). 

b IfBcAthenA— ВО (АП B). 

с Further, show that (A N B) and (A N B) are mutually exclusive and therefore that A is the 
union of two mutually exclusive sets, (A N B) and (А П B). 


d Also show that В and (A N B) are mutually exclusive and if B C A, A is the union of two 
mutually exclusive sets, B and (АП B). 


From a survey of 60 students attending a university, it was found that 9 were living off campus, 
36 were undergraduates, and 3 were undergraduates living off campus. Find the number of 
these students who were 


a undergraduates, were living off campus, or both. 
b undergraduates living on campus. 
C graduate students living on campus. 


A group of five applicants for a pair of identical jobs consists of three men and two women. The 
employer is to select two of the five applicants for the jobs. Let S denote the set of all possible 
outcomes for the employer's selection. Let A denote the subset of outcomes corresponding to 
the selection of two men and B the subset corresponding to the selection of at least one woman. 
List the outcomes in A, B, AU B, AN B, and AN B. (Denote the different men and women 
by Mı, М», Мз and W,, И, respectively.) 


Suppose two dice are tossed and the numbers on the upper faces are observed. Let 5 denote 
the set of all possible pairs that can be observed. [These pairs can be listed, for example, by 
letting (2, 3) denote that a 2 was observed on the first die and a 3 on the second.] 


a Define the following subsets of S: 


А: The number on the second die is even. 
B: The sum of the two numbers is even. 
C: At least one number in the pair is odd. 


b List the points in A, C, AN B, АПВ, AU B, and ANC. 


A Probabilistic Model for an Experiment: 
The Discrete Case 


In Section 2.2 we referred to the die-tossing experiment when we observed the number 
appearing on the upper face. We will use the term experiment to include observations 
obtained from completely uncontrollable situations (such as observations on the daily 
price of a particular stock) as well as those made under controlled laboratory condi- 
tions. We have the following definition: 


DEFINITION 2.1 


DEFINITION 2.2 
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An experiment is the process by which an observation is made. 


Examples of experiments include coin and die tossing, measuring the IQ score of an 
individual, or determining the number of bacteria per cubic centimeter in a portion 
of processed food. 

When an experiment is performed, it can result in one or more outcomes, which 
are called events. In our discussions, events will be denoted by capital letters. If the 
experiment consists of counting the number of bacteria in a portion of food, some 
events of interest could be 


A: Exactly 110 bacteria are present. 
B: More than 200 bacteria are present. 
C: The number of bacteria present is between 100 and 300. 


Some events associated with a single toss of a balanced die are these: 


А: Observe an odd number. 

B: Observe a number less than 5. 
C: Observe a2 or a 3. 

Еу: Observe a 1. 

Ез: Observe а 2. 

Ез: Observe a 3. 

E4: Observe a 4. 

Es: Observe a 5. 

Eg: Observe a 6. 


You can see that there is a distinct difference among some of the events associated 
with the die-tossing experiment. For example, if you observe event A (an odd number), 
at the same time you will have observed E1, Es, ог Es. Thus, event A, which can be 
decomposed into three other events, is called a compound event. In contrast, the events 
Е, E», Ез, Ед, Es, and Е cannot be decomposed and are called simple events. A 
simple event can happen only in one way, whereas a compound event can happen in 
more than one distinct way. 

Certain concepts from set theory are useful for expressing the relationships between 
various events associated with an experiment. Because sets are collections of points, 
we associate a distinct point, called a sample point, with each and every simple event 
associated with an experiment. 


A simple event is an event that cannot be decomposed. Each simple event 
corresponds to one and only one sample point. The letter E with a subscript 
will be used to denote a simple event or the corresponding sample point. 


Thus, we can think of a simple event as a set consisting of a single point—namely, 
the single sample point associated with the event. 
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DEFINITION 2.3 


DEFINITION 2.4 


The sample space associated with an experiment is the set consisting of all 
possible sample points. A sample space will be denoted by S. 


We can easily see that the sample space S associated with the die-tossing experi- 
ment consists of six sample points corresponding to the six simple events Ej, E», E3, 
E4, Es, and Eg. That is, 5 = (Ei, E», Ез, E4, Es, Eo}. A Venn diagram exhibiting 
the sample space for the die-tossing experiment is given in Figure 2.7. 

For the microbiology example of counting bacteria in a food specimen, let Eo 
correspond to observing 0 bacteria, E, correspond to observing 1 bacterium, and so 
on. Then the sample space is 


S = (Ep, Е, E2,...} 


because no integer number of bacteria can be ruled out as a possible outcome. 

Both sample spaces that we examined have the property that they consist of either 
a finite or a countable number of sample points. In the die-tossing example, there are 
six (a finite number) sample points. The number of sample points associated with 
the bacteria-counting experiment is infinite, but the number of distinct sample points 
can be put into a one-to-one correspondence with the integers (that is, the number of 
sample points is countable). Such sample spaces are said to be discrete. 


A discrete sample space is one that contains either a finite or a countable number 
of distinct sample points. 


When an experiment is conducted a single time, you will observe one and only one 
simple event. For example, if you toss a die and observe a 1, you cannot at the same 
time observe a 2. Thus, the single sample point E; associated with observing a 1 and 
the single sample point E» associated with observing a 2 are distinct, and the sets { Е} 
and { Е} are mutually exclusive sets. Thus, events E; and E» are mutually exclusive 
events. Similarly, all distinct simple events correspond to mutually exclusive sets of 
simple events and are thus mutually exclusive events. 

For experiments with discrete sample spaces, compound events can be viewed as 
collections (sets) of sample points or, equivalently, as unions of the sets of single 
sample points corresponding to the appropriate simple events. For example, the die- 
tossing event A (observe an odd number) will occur if and only if one of the simple 
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DEFINITION 2.5 
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events Ej, Ез, or Es occurs. Thus, 


А = (Ei, Ез, Es} or А = Еу U Ез U Es. 
Similarly, В (observe a number less than 5) сап be written as 
B = {Е}, E», Ез, E4} or B = E U Ез. U E3 U Ед. 


The rule for determining which simple events to include in a compound event is very 
precise. A simple event E; is included in event A if and only if A occurs whenever E; 
occurs. 


An event in a discrete sample space S is a collection of sample points—that is, 
any subset of S. 


Figure 2.8 gives a Venn diagram representing the sample space and events 
A (observe an odd number) and B (observe a number less than 5) for the die-tossing 
experiment. Notice that it is easy to visualize the relationship between events by using 
a Venn diagram. 

By Definition 2.5, any event in a discrete sample space S is a subset of S. In the 
example concerning counting bacteria in a portion of food, the event B (the number 
of bacteria is more than 200) can be expressed as 


В = {Езо, E22, E203, - - .}. 


where Е; denotes the simple event that there are i bacteria present in the food sample 
andi =0,1,2,.... 

A probabilistic model for an experiment with a discrete sample space can be 
constructed by assigning a numerical probability to each simple event in the sample 
space S. We will select this number, a measure of our belief in the event’s occur- 
rence on a single repetition of the experiment, in such a way that it will be consistent 
with the relative frequency concept of probability. Although relative frequency does 
not provide a rigorous definition of probability, any definition applicable to the real 
world should agree with our intuitive notion of the relative frequencies of events. 

On analyzing the relative frequency concept of probability, we see that three con- 
ditions must hold. 


1. The relative frequency of occurrence of any event must be greater than or equal 
to zero. A negative relative frequency does not make sense. 
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DEFINITION 2.6 


2. The relative frequency of the whole sample space 5 must be unity. Because 
every possible outcome of the experiment is a point in S, it follows that S must 
occur every time the experiment is performed. 

3. Iftwo events are mutually exclusive, the relative frequency of their union is the 
sum of their respective relative frequencies. (For example, if the experiment of 
tossing a balanced die yields a 1 on 1/6 of the tosses, it should yield a 1 or a2 
on 1/6 + 1/6 = 1/3 of the tosses.) 


These three conditions form the basis of the following definition of probability. 


Suppose S is a sample space associated with an experiment. To every event A 
in S (A is a subset of S), we assign a number, P(A), called the probability of 
A, so that the following axioms hold: 


Axiom 1: P(A) > 0. 

'"Axio0m2: P (iS) = 

Axiom 3: If Ay, Аз, As, ... form a sequence of pairwise mutually 
exclusive events in 5 (that is, A; Aj = Dif i 5 j), then 


oo 
P(AiU A3U A3U--) = x P(A;). 
{= 


We can easily show that Axiom 3, which is stated in terms of an infinite sequence of 
events, implies a similar property for a finite sequence. Specifically, if Ay, A2,..., An 
are pairwise mutually exclusive events, then 


P(A, U A3U A3 U-- -U An) = NIC 
i=l 


Notice that the definition states only the conditions an assignment of probabilities 
must satisfy; it does not tell us how to assign specific probabilities to events. For 
example, suppose that a coin has yielded 800 heads in 1000 previous tosses. Consider 
the experiment of one more toss of the same coin. There are two possible outcomes, 
head or tail, and hence two simple events. The definition of probability allows us to 
assign to these simple events any two nonnegative numbers that add to 1. For example, 
each simple event could have the probability 1/2. In light of the past history of this 
coin, however, it might be more reasonable to assign a probability nearer .8 to the 
outcome involving a head. Specific assignments of probabilities must be consistent 
with reality if the probabilistic model is to serve a useful purpose. 

For discrete sample spaces, it suffices to assign probabilities to each simple event. If 
a balanced die is used for the die-tossing example, it seems reasonable to assume that 
all simple events would have the same relative frequency in the long run. We will assign 
a probability of 1/6 to each simple event: Р(Е;) = 1/6, fori = 1,2,...,6. This 
assignment of probabilities agrees with Axiom 1. To see that Axiom 2 is satisfied, write 


P(S) = P(E, U E2 U---U Eg) = P(E) + P(E2) +--+ + P(Eg) = 1. 


The second equality follows because Axiom 3 must hold. Axiom 3 also tells us that 
we can calculate the probability of any event by summing the probabilities of the 
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simple events contained in that event (recall that distinct simple events are mutually 
exclusive). Event A was defined to be “observe an odd number.” Hence, 


P(A) = P(E, U E3 U Es) = P(E1) + Р(Ез) + P(Es) = 1/2. 


EXAMPLE 2.1 


Solution 


A manufacturer has five seemingly identical computer terminals available for ship- 
ping. Unknown to her, two of the five are defective. A particular order calls for 
two of the terminals and is filled by randomly selecting two of the five that are 


available. 


a List the sample space for this experiment. 
b Let A denote the event that the order is filled with two nondefective terminals. 


List the sample points in A. 


с Construct a Venn diagram for the experiment that illustrates event А. 
d Assign probabilities to the simple events in such a way that the information 
about the experiment is used and the axioms in Definition 2.6 are met. 


e Find the probability of event A. 


a Let the two defective terminals be labeled Dj and D» and let the three good 
terminals be labeled G1, G5, and Сз. Any single sample point will consist of 
a list of the two terminals selected for shipment. The simple events may be 


denoted by 


Gi, С), Ер = (G2, Сз). 


Thus, there are ten sample points in S, and 5 = (Ej, E», ..., E10}. 


b Event А = (Es, Eo, E10}. 
с 


а Because the terminals are selected at random, any pair of terminals is as likely 
to be selected as any other pair. Thus, P(E;) = 1/10, fori = 1,2,..., 10,isa 


reasonable assignment of probabilities. 
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e Because A = Eg U Eo U E10, Axiom 3 implies that 


P(A) = Р(Ев) + Р(Ео) + P (E10) = 3/10. 


The next section contains an axiomatic description of the method for calculating 
P(A) that we just used. 

Before we proceed, let us note that there are experiments for which the sample space 
is not countable and hence is not discrete. Suppose, for example, that the experiment 
consists of measuring the blood glucose level of a diabetic patient. The sample space 
for this experiment would contain an interval of real numbers, and any such interval 
contains an uncountable number of values. Thus, the sample space is not discrete. 
Situations like the latter will be discussed in Chapter 4. The remainder of this chapter 
is devoted to developing methods for calculating the probabilities of events defined 
on discrete sample spaces. 


Exercises 


Every person’s blood type is A, B, AB, or O. In addition, each individual either has the 
Rhesus (Rh) factor (+) or does not (—). A medical technician records a person’s blood type 
and Rh factor. List the sample space for this experiment. 


The proportions of blood phenotypes, A, B, AB, and O, in the population of all Caucasians in 
the United States are approximately .41, .10, .04, and .45, respectively. A single Caucasian is 
chosen at random from the population. 


a List the sample space for this experiment. 
Make use of the information given above to assign probabilities to each of the simple events. 


What is the probability that the person chosen at random has either type A or type AB 
blood? 


A sample space consists of five simple events, Е, E», E3, E4, and Es. 

a If P(Ej) = P(E2) = 0.15, Р(Ез) = 0.4, and P(E4) = 2P(Es), find the probabilities of 
E, and Es. 

b If P(E;) = 3P(E2) = 0.3, find the probabilities of the remaining simple events if you 


know that the remaining simple events are equally probable. 


A vehicle arriving at an intersection can turn right, turn left, or continue straight ahead. The 
experiment consists of observing the movement of a single vehicle through the intersection. 


a List the sample space for this experiment. 


b Assuming that all sample points are equally likely, find the probability that the vehicle turns. 


Americans can be quite suspicious, especially when it comes to government conspiracies. On 
the question of whether the U.S. Air Force has withheld proof of the existence of intelligent 
life on other planets, the proportions of Americans with varying opinions are given in the 
table. 
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Opinion Proportion 
Very likely 24 
Somewhat likely 24 
Unlikely 40 
Other 12 


Suppose that one American is selected and his or her opinion is recorded. 


a What are the simple events for this experiment? 

b Are the simple events that you gave in part (a) all equally likely? If not, what are the 
probabilities that should be assigned to each? 

c Whatis the probability that the person selected finds it at least somewhat likely that the Air 
Force is withholding information about intelligent life on other planets? 


A survey classified a large number of adults according to whether they were diagnosed as 
needing eyeglasses to correct their reading vision and whether they use eyeglasses when reading. 
The proportions falling into the four resulting categories are given in the following table: 


Uses Eyeglasses 
for Reading 
Needs glasses Yes No 
Yes 44 14 
Мо ‚02 40 


If a single adult is selected from the large group, find the probabilities of the events defined 
below. The adult 


a needs glasses. 
b needs glasses but does not use them. 


c uses glasses whether the glasses are needed or not. 


An oil prospecting firm hits oil or gas on 10% of its drillings. If the firm drills two wells, 
the four possible simple events and three of their associated probabilities are as given in the 
accompanying table. Find the probability that the company will hit oil or gas 


a on the first drilling and miss on the second. 


b onatleast one of the two drillings. 


Simple Outcome of Outcome of 
Event First Drilling Second Drilling Probability 


Е, Hit (oil or gas) Hit (oil or gas) .01 
E» Hit Miss ? 

E; Miss Hit .09 
E, Miss Miss 81 


Of the volunteers coming into a blood center, 1 in 3 have O* blood, 1 in 15 have O^, 1 in 3 
have A*, and 1 in 16 have A^. The name of one person who previously has donated blood is 
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selected from the records of the center. What is the probability that the person selected has 


aan B® 


type O* blood? 

type O blood? 

type A blood? 

neither type A nor type O blood? 


Hydraulic landing assemblies coming from an aircraft rework facility are each inspected for 
defects. Historical records indicate that 8% have defects in shafts only, 6% have defects in 
bushings only, and 2% have defects in both shafts and bushings. One of the hydraulic assemblies 
is selected randomly. What is the probability that the assembly has 


aa C $9 


a bushing defect? 
a shaft or bushing defect? 
exactly one of the two types of defects? 


neither type of defect? 


Suppose two balanced coins are tossed and the upper faces are observed. 


a 
b 


d 


List the sample points for this experiment. 

Assign a reasonable probability to each sample point. (Are the sample points equally 
likely?) 

Let A denote the event that exactly one head is observed and B the event that at least one 
head is observed. List the sample points in both A and B. 

From your answer to part (c), find P(A), P(B), P(A N B), P(A U B), and P(AU B). 


A business office orders paper supplies from one of three vendors, Vi, V2, or V3. Orders are to 
be placed on two successive days, one order per day. Thus, (V2, V3) might denote that vendor 
V» gets the order on the first day and vendor V3 gets the order on the second day. 


a 
b 


List the sample points in this experiment of ordering paper on two successive days. 
Assume the vendors are selected at random each day and assign a probability to each sample 
point. 


Let A denote the event that the same vendor gets both orders and В the event that V» gets at 
least one order. Find P(A), P(B), P(A U В), and P(A П B) by summing the probabilities 
of the sample points in these events. 


The following game was played on a popular television show. The host showed a contestant 
three large curtains. Behind one of the curtains was a nice prize (maybe a new car) and behind 
the other two curtains were worthless prizes (duds). The contestant was asked to choose one 
curtain. If the curtains are identified by their prizes, they could be labeled С, D;, and D» (Good 
Prize, Dud1, and Dud2). Thus, the sample space for the contestants choice is 5 = {G, D4, Г}. 


а 


If the contestant has no idea which curtains hide the various prizes and selects a curtain at 
random, assign reasonable probabilities to the simple events and calculate the probability 
that the contestant selects the curtain hiding the nice prize. 

Before showing the contestant what was behind the curtain initially chosen, the game show 
host would open one of the curtains and show the contestant one of the duds (he could 
always do this because he knew the curtain hiding the good prize). He then offered the 


1. Exercises preceded by an asterisk are optional. 


*2.21 


*2.22 


2.23 
2.24 


2.5 


2.5 Calculating the Probability of an Event: The Sample-Point Method 35 


contestant the option of changing from the curtain initially selected to the other remaining 
unopened curtain. Which strategy maximizes the contestant’s probability of winning the 
good prize: stay with the initial choice or switch to the other curtain? In answering the 
following sequence of questions, you will discover that, perhaps surprisingly, this question 
can be answered by considering only the sample space above and using the probabilities 
that you assigned to answer part (a). 


i If the contestant choses to stay with her initial choice, she wins the good prize if and 
only if she initially chose curtain G. If she stays with her initial choice, what is the 
probability that she wins the good prize? 

ii If the host shows her one of the duds and she switches to the other unopened curtain, 
what will be the result if she had initially selected С? 

iii Answer the question in part (ii) if she had initially selected one of the duds. 
iv If the contestant switches from her initial choice (as the result of being shown one of 
the duds), what is the probability that the contestant wins the good prize? 

v Which strategy maximizes the contestant's probability of winning the good prize: stay 
with the initial choice or switch to the other curtain? 


If A and B are events, use the result derived in Exercise 2.5(a) and the Axioms in Definition 
2.6 to prove that 


P(A) = P(AN B) + Р(АП B). 


If A and B are events and B С A, use the result derived in Exercise 2.5(b) and the Axioms in 
Definition 2.6 to prove that 


P(A) = P(B) + P(An B). 


If A and B are events and В С А, why is it “obvious” that P(B) < P(A)? 


Use the result in Exercise 2.22 and the Axioms in Definition 2.6 to prove the “obvious” result 
in Exercise 2.23. 


Calculating the Probability of an Event: 
The Sample-Point Method 


Finding the probability of an event defined on a sample space that contains a finite or 
denumerable (countably infinite) set of sample points can be approached in two ways, 
the sample-point and the event-composition methods. Both methods use the sample 
space model, but they differ in the sequence of steps necessary to obtain a solution 
and in the tools that are used. Separation of the two procedures may not be palatable 
to the unity-seeking theorist, but it can be extremely useful to a beginner attempting to 
find the probability of an event. In this section we consider the sample-point method. 
The event-composition method requires additional results and will be presented in 
Section 2.9. 
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The sample-point method is outlined in Section 2.4. The following steps are 
used to find the probability of an event: 


1. Define the experiment and clearly determine how to describe one simple 
event. 

2. Listthe simple events associated with the experiment and test each to make 
certain that it cannot be decomposed. This defines the sample space S. 

3. Assign reasonable probabilities to the sample points in S, making certain 
that P(E;) > Oand ж P(E;) = 1. 

4. Define the event of interest, A, as a specific collection of sample points. 
(A sample point is in A if A occurs when the sample point occurs. Test all 
sample points in S to identify those in A.) 

5. Find P(A) by summing the probabilities of the sample points in A. 


We will illustrate these steps with three examples. 


EXAMPLE 2.2 


Solution 


Consider the problem of selecting two applicants for a job out of a group of five and 
imagine that the applicants vary in competence, 1 being the best, 2 second best, and 
so on, for 3, 4, and 5. These ratings are of course unknown to the employer. Define 
two events A and B as: 

A: The employer selects the best and one of the two poorest 


applicants (applicants 1 and 4 or 1 and 5). 


B: The employer selects at least one of the two best. 


Find the probabilities of these events. 


The steps are as follows: 


1. 


2. 


The experiment involves randomly selecting two applicants out of five. Denote 
the selection of applicants 3 and 5 by {3, 5}. 
The ten simple events, with {i, j} denoting the selection of applicants i and 
j. are 

E,: {1, 2}, Es: {2, 3}, Eg: {3, 4}, Еро: (4, 5}. 

E2: {1, 3}, Ee: {2, 4}, Eo: {3, 5}, 

E3: {1,4}, Еу: {2, 5}, 

Ед: (1, 5}, 


A random selection of two out of five gives each pair an equal chance for 
selection. Hence, we will assign each sample point the probability 1/10. That is, 


P(E) =1/10=.1, i=1,2,..., 10. 


Checking the sample points, we see that B occurs whenever Ei, E», Ез, E4, 
Es, Ев, or E; occurs. Hence, these sample points are included in В. 


5: 
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Finally, P(B) is equal to the sum of the probabilities of the sample points in 
B, or 


7 T 
P(B) = у, P(E) = у! = 7. 
1=1 1=1 


Similarly, we see that event A = Ез U E4 and that P(A) = .1 + .1 = .2. ш 


The solution of this and similar problems would be of importance to a company 
personnel director. 


EXAMPLE 2.3 А balanced coin is tossed three times. Calculate the probability that exactly two of 
the three tosses result in heads. 


Solution Тһе five steps of the sample-point method are as follows: 


1. 


The experiment consists of observing the outcomes (heads or tails) for each of 
three tosses of acoin. A simple event for this experiment can be symbolized by a 
three-letter sequence of H’s and T’s, representing heads and tails, respectively. 
The first letter in the sequence represents the observation on the first coin. The 
second letter represents the observation on the second coin, and so on. 

The eight simple events in 5 are 


Еу: ННН, Ез: НТН, Es: ATT, Ет: ТТН, 
Е: HHT, E4:THH, Е: ТНТ, Ез: TTT. 


Because the coin is balanced, ме would expect ће simple events to be equally 
likely; that is, 
P(E;) = 1/8, Py 2:8. 


The event of interest, A, is the event that exactly two of the tosses result in 
heads. An examination of the sample points will verify that 


A = (Е, Es, Ед). 
Finally, 


P(A) = P(E) + P(E3) + P(E4) = 1/8 + 1/8 + 1/8 = 3/8. 


Although the sample points in the sample spaces associated with Examples 2.2 
and 2.3 are equally likely, it is important to realize that sample points need not be 
equally likely. An example to illustrate this point follows. 
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EXAMPLE 2.4 


Solution 


The odds are two to one that, when A and B play tennis, A wins. Suppose that A and 
B play two matches. What is the probability that A wins at least one match? 


1. The experiment consists of observing the winner (A or B) for each of two 
matches. Let AB denote the event that player A wins the first match and player 
B wins the second. 

2. The sample space for the experiment consists of four sample points: 


Е: AA, Е: АВ, Ез: BA, E4: BB 


3. Because A has a better chance of winning any match, it does not seem appro- 
priate to assign equal probabilities to these sample points. As you will see in 
Section 2.9, under certain conditions it is reasonable to make the following 
assignment of probabilities: 


P(E) =4/9,  P(E)-22/9, Р(Ез) = 2/9, Р(Е) = 1/9. 


Notice that, even though the probabilities assigned to the simple events аге not 
all equal, P(E;) > 0, fori = 1, 2, 3, 4, and Ms Р(Е;) = 1. 

4. The event of interest is that А wins at least one game. Thus, if we denote ће 
event of interest as C, it is easily seen that 


C = Е. U E U Es. 
5. Finally, 
P(C) = P (E1) + Р(Е») + P (E3) = 4/9 + 2/9 + 2/9 = 8/9. ш 


The sample-point method for solving a probability problem is direct and powerful 
and in some respects is a bulldozer approach. It can be applied to find the probability of 
any event defined over a sample space containing a finite or countable set of sample 
points, but it is not resistant to human error. Common errors include incorrectly 
diagnosing the nature of a simple event and failing to list all the sample points in 
S. A second complication occurs because many sample spaces contain a very large 
number of sample points and a complete itemization is tedious and time consuming 
and might be practically impossible. 

Fortunately, many sample spaces generated by experimental data contain subsets 
of sample points that are equiprobable. (The sample spaces for Examples 2.1, 2.2, 
and 2.3 possess this property.) When this occurs, we need not list the points but may 
simply count the number in each subset. If such counting methods are inapplicable, 
an orderly method should be used to list the sample points (notice the listing schemes 
for Examples 2.1, 2.2, and 2.3). The listing of large numbers of sample points can be 
accomplished by using a computer. 

Tools that reduce the effort and error associated with the sample-point approach 
for finding the probability of an event include orderliness, a computer, and the math- 
ematical theory of counting, called combinatorial analysis. Computer programming 
and applications form a topic for separate study. The mathematical theory of combi- 
natorial analysis is also a broad subject, but some quite useful results can be given 
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succinctly. Hence, our next topic concerns some elementary results in combinato- 
rial analysis and their application to the sample-point approach for the solution of 
probability problems. 


Exercises 


A single car is randomly selected from among all of those registered at a local tag agency. 
What do you think of the following claim? “All cars are either Volkswagens or they are not. 
Therefore, the probability is 1/2 that the car selected is a Volkswagen.” 


Three imported wines are to be ranked from lowest to highest by a purported wine expert. That 
is, one wine will be identified as best, another as second best, and the remaining wine as worst. 


a Describe one sample point for this experiment. 
List the sample space. 


c Assume that the "expert" really knows nothing about wine and randomly assigns ranks to 
the three wines. One of the wines is of much better quality than the others. What is the 
probability that the expert ranks the best wine no worse than second best? 


In Exercise 2.12 we considered a situation where cars entering an intersection each could turn 
right, turn left, or go straight. An experiment consists of observing two vehicles moving through 
the intersection. 


a How many sample points are there in the sample space? List them. 


b Assuming that all sample points are equally likely, what is the probability that at least one 
car turns left? 


c Again assuming equally likely sample points, what is the probability that at most one 
vehicle turns? 


Four equally qualified people apply for two identical positions in a company. One and only 
one applicant is a member of a minority group. The positions are filled by choosing two of the 
applicants at random. 


a Listthe possible outcomes for this experiment. 
b Assign reasonable probabilities to the sample points. 


c Find the probability that the applicant from the minority group is selected for a position. 


Two additional jurors are needed to complete a jury for a criminal trial. There are six prospective 
jurors, two women and four men. Two jurors are randomly selected from the six available. 


a Define the experiment and describe one sample point. Assume that you need describe only 
the two jurors chosen and not the order in which they were selected. 
b Listthe sample space associated with this experiment. 


What is the probability that both of the jurors selected are women? 


According to Webster's New Collegiate Dictionary, a divining rod is “а forked rod believed 
to indicate [divine] the presence of water or minerals by dipping downward when held over a 
vein.” To test the claims of a divining rod expert, skeptics bury four cans in the ground, two 
empty and two filled with water. The expert is led to the four cans and told that two contain 
water. He uses the divining rod to test each of the four cans and decide which two contain water. 


a Listthe sample space for this experiment. 


b Ifthe divining rod is completely useless for locating water, what is the probability that the 
expert will correctly identify (by guessing) both of the cans containing water? 
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The Bureau of the Census reports that the median family income for all families in the United 
States during the year 2003 was $43,318. That is, half of all American families had incomes 
exceeding this amount, and half had incomes equal to or below this amount. Suppose that four 
families are surveyed and that each one reveals whether its income exceeded $43,318 in 2003. 


a List the points in the sample space. 


b Identify the simple events in each of the following events: 


A: Atleast two had incomes exceeding $43,318. 
B: Exactly two had incomes exceeding $43,318. 
C: Exactly one had income less than or equal to $43,318. 


c Make use of the given interpretation for the median to assign probabilities to the simple 
events and find P(A), P(B), and P(C). 


Patients arriving at a hospital outpatient clinic can select one of three stations for service. 
Suppose that physicians are assigned randomly to the stations and that the patients therefore 
have no station preference. Three patients arrive at the clinic and their selection of stations is 
observed. 


a Listthe sample points for the experiment. 
b Let A be the event that each station receives a patient. List the sample points in A. 


c Make a reasonable assignment of probabilities to the sample points and find P(A). 


A boxcar contains six complex electronic systems. Two of the six are to be randomly selected 
for thorough testing and then classified as defective or not defective. 


a Iftwo of the six systems are actually defective, find the probability that at least one of the 
two systems tested will be defective. Find the probability that both are defective. 


b If four of the six systems are actually defective, find the probabilities indicated in part (a). 


A retailer sells only two styles of stereo consoles, and experience shows that these are in equal 
demand. Four customers in succession come into the store to order stereos. The retailer is 
interested in their preferences. 


a List the possibilities for preference arrangements among the four customers (that is, list 
the sample space). 
Assign probabilities to the sample points. 


Let A denote the event that all four customers prefer the same style. Find P(A). 


Tools for Counting Sample Points 


This section presents some useful results from the theory of combinatorial analysis 
and illustrates their application to the sample-point method for finding the probability 
of an event. In many cases, these results enable you to count the total number of 
sample points in the sample space S and in an event of interest, thereby providing 
a confirmation of your listing of simple events. When the number of simple events 
in a sample space is very large and manual enumeration of every sample point is 
tedious or even impossible, counting the number of points in the sample space and in 
the event of interest may be the only efficient way to calculate the probability of an 
event. Indeed, if a sample space contains N equiprobable sample points and an event 
A contains exactly na sample points, it is easily seen that P(A) = n;/N. 


FIGURE 2.9 
Table indicating the 
number of pairs 
(ai, bj) 


THEOREM 2.1 


Proof 
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ooo 


The first result from combinatorial analysis that we present, often called the mn 
rule, is stated as follows: 


With m elements aj, a5, ..., аһ and n elements b1, b2,..., bn, it is possible 
to form mn — m x n pairs containing one element from each group. 


Verification of the theorem can be seen by observing the rectangular table in 
Figure 2.9. There is one square in the table for each a;, b; pair and hence a total 
of m x n squares. 


The mn rule can be extended to any number of sets. Given three sets of elements— 
а1, d2, ..., dg; by, Ёо, ..., bn; and c1, c5, ..., cy—the number of distinct triplets 
containing one element from each set is equal to mnp. The proof of the theorem for 
three sets involves two applications of Theorem 2.1. We think of the first set as an 
(aj, bj) pair and unite each of these pairs with elements of the third set, с, c2, ..., Ср. 
Theorem 2.1 implies that there are mn pairs (a;, b;). Because there are p elements 
Ср, C2, ..., Cp, another application of Theorem 2.1 implies that there are (mn)(p) = 
тпр triplets a;b jcx. 


EXAMPLE 2.5 


Solution 


An experiment involves tossing a pair of dice and observing the numbers on the upper 
faces. Find the number of sample points in S, the sample space for the experiment. 


A sample point for this experiment can be represented symbolically as an ordered 
pair of numbers representing the outcomes on the first and second die, respectively. 
Thus, (4, 5) denotes the event that the uppermost face on the first die was a 4 and on 
the second die, a5. The sample space S consists of the set of all possible pairs (x, y), 
where x and y are both integers between | and 6. 

The first die can result іп one of six numbers. These represent а, а2,..., 
ав. Likewise, the second die can fall in one of six ways, and these correspond to 
bj, b2,..., bg. Then m = n = 6 and the total number of sample points in S is 
mn — (6)(6) — 36. а 
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EXAMPLE 2.6 


Solution 


Refer to the coin-tossing experiment in Example 2.3. We found for this example that 
the total number of sample points was eight. Use the extension of the mn rule to 
confirm this result. 


Each sample point in S was identified by a sequence of three letters, where each 
position in the sequence contained one of two letters, an H or a T. The problem 
therefore involves the formation of triples, with an element (an Н or a Т) from each 
of three sets. For this example the sets are identical and all contain two elements 
(Н and T). Thus, the number of elements in each set is т = n = p = 2, and the 
total number of triples that can be formed is mnp — (2)? — 8. 


ЕХАМРГЕ 2.7 


Solution 


Consider an experiment that consists of recording the birthday for each of 20 randomly 
selected persons. Ignoring leap years and assuming that there are only 365 possible 
distinct birthdays, find the number of points in the sample space S for this experiment. 
If we assume that each of the possible sets of birthdays is equiprobable, what is the 
probability that each person in the 20 has a different birthday? 


Number the days of the year 1, 2, ..., 365. A sample point for this experiment can be 
represented by an ordered sequence of 20 numbers, where the first number denotes 
the number of the day that is the first person's birthday, the second number denotes the 
number of the day that is the second person's birthday, and so on. We are concerned 
with the number of 20-tuples that can be formed, selecting a number representing 
one of the 365 days in the year from each of 20 sets.The sets are all identical, and 
each contains 365 elements. Repeated applications of the тп rule tell us there are 
(365)? such 20-tuples. Thus, the sample space 5 contains N = (365)? sample 
points. Although we could not feasibly list all the sample points, if we assume them 
to be equiprobable, P(E;) = 1/(365)?? for each simple event. 

If we denote the event that each person has a different birthday by A, the probability 
of A can be calculated if we can determine n4, the number of sample points in A. 
A sample point is in A if the corresponding 20-tuple is such that no two positions 
contain the same number. Thus, the set of numbers from which the first element in a 
20-tuple in A can be selected contains 365 numbers, the set from which the second 
element can be selected contains 364 numbers (all but the one selected for the first 
element), the set from which the third can be selected contains 363 (all but the two 
selected for the first two elements), ..., and the set from which the 20th element can 
be selected contains 346 elements (all but those selected for the first 19 elements). 
An extension of the mn rule yields 


па = (365) x (364) x --- x (346). 
Finally, we may determine that 
P(A) = Na _ 365 x 364 x--- x 346 


у= D = .5886. H 


DEFINITION 2.7 


THEOREM 2.2 


Proof 
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Notice that for Examples 2.5 and 2.6 the numbers of sample points in the respective 
sample spaces are both relatively small and that listings for these sample spaces could 
easily be written down. For instances like these, the тп rule provides a simple method 
to verify that the sample spaces contain the correct number of points. In contrast, it 
is not feasible to list the sample space in Example 2.7. However, the mn rule can be 
used to count the number of sample points in S and in the event of interest, permitting 
calculation of the probability of the event. 

We have seen that the sample points associated with an experiment often can be 
represented symbolically as a sequence of numbers or symbols. In some instances, it 
will be clear that the total number of sample points equals the number of distinct ways 
that the respective symbols can be arranged in sequence. The following theorem can 
be used to determine the number of ordered arrangements that can be formed. 


An ordered arrangement of r distinct objects is called a permutation. The num- 
ber of ways of ordering n distinct objects taken r at a time will be designated 
by the symbol P”. 


n! 
(n—r) 


We are concerned with the number of ways of filling r positions with n distinct 
objects. Applying the extension of the mn rule, we see that the first object can 
be chosen in one of n ways. After the first is chosen, the second can be chosen 
in (n — 1) ways, the third in (n — 2), and the rth in (n — r 4- 1) ways. Hence, 
the total number of distinct arrangements is 

P" —-n(n—10)n-—2)---(n—r-4 1). 


r 


Р" = п(п – 1) (п – 2) :-:-(п-г+1) = 


Expressed in terms of factorials, 


(n — r)! n! 
Р" —n(n—1)n—2)---(n—r- 1) = 
п 


(n—r)! (п—т)! 
where п! = n(n — 1)... (2)(1) and 0! = 1. 


EXAMPLE 2.8 


Solution 


The names of 3 employees are to be randomly drawn, without replacement, from a 
bowl containing the names of 30 employees of a small company. The person whose 
name is drawn first receives $100, and the individuals whose names are drawn second 
and third receive $50 and $25, respectively. How many sample points are associated 
with this experiment? 


Because the prizes awarded are different, the number of sample points is the number 
of ordered arrangements of r = 3 out of the possible n = 30 names. Thus, the number 
of sample points in S is 

30 


! 
30 __ ao = 
PP = — = (30)(29)(28) = 24,360. o 
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EXAMPLE 2.9 


Solution 


THEOREM 2.3 


Proof 


Suppose that an assembly operation in a manufacturing plant involves four steps, 
which can be performed in any sequence. If the manufacturer wishes to compare 
the assembly time for each of the sequences, how many different sequences will be 
involved in the experiment? 


The total number of sequences equals the number of ways of arranging the n — 4 
steps taken r — 4 at a time, or 
4 4! 4! 


Рі = ——— 2-2. 
4" 4-4)! 0! m 


The next result from combinatorial analysis can be used to determine the number 
of subsets of various sizes that can be formed by partitioning a set of n distinct objects 
into k nonoverlapping groups. 


The number of ways of partitioning n distinct objects into k distinct groups 
containing п, n2, ..., Nng objects, respectively, where each object appears in 
exactly one group and Уу ш i) = 015 


п п! 
N= Le ы а? 
MM) nts OOO Mp nj! п»! +++ ny! 


М is the number of distinct arrangements of п objects in a row for a case in 
which rearrangement of the objects within a group does not count. For example, 
the letters a to / are arranged in three groups, where n, = 3,n2 = 4, and 
Из = 3k 


abc|def g|hi jkl 


is one such arrangement. 

The number of distinct arrangements of the n objects, assuming all objects 
are distinct, is P? = n! (from Theorem 2.2). Then Р” equals the number of 
ways of partitioning the п objects into К groups (ignoring order within groups) 
multiplied by the number of ways of ordering the nı, 75, ..., ng elements 
within each group. This application of the extended тп rule gives 


P? = (№) · (nı!n2!n3!- --ng!), 


where n;! is the number of distinct arrangements of the n; objects in group i. 
Solving for N, we have 


n! n 
N =———— = ; 
nj!n3!--- ny! nj No +++ Nk 
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Ny na Ng 
in the expansion of the multinomial term y; + y2 + - - - + y, raised to the nth power: 


п 2 3 
otat x 2X, "i" bere 


where this sum is taken over all n; = 0, 1,..., n such that nj +n.+---+nyp = n. 


The terms ( б ) are often called multinomial coefficients because they occur 


EXAMPLE 2.10 


Solution 


A labor dispute has arisen concerning the distribution of 20 laborers to four different 
construction jobs. The first job (considered to be very undesirable) required 6 laborers; 
the second, third, and fourth utilized 4, 5, and 5 laborers, respectively. The dispute 
arose over an alleged random distribution of the laborers to the jobs that placed all 4 
members of a particular ethnic group on job 1. In considering whether the assignment 
represented injustice, a mediation panel desired the probability of the observed event. 
Determine the number of sample points in the sample space 5 for this experiment. 
That is, determine the number of ways the 20 laborers can be divided into groups of 
the appropriate sizes to fill all of the jobs. Find the probability of the observed event 
if it is assumed that the laborers are randomly assigned to jobs. 


The number of ways of assigning the 20 laborers to the four jobs is equal to the number 
of ways of partitioning the 20 into four groups of sizes nj = 6, n2 = 4, пз = n4 = 5. 


Then 
20 20! 
М = = ——___.. 
6455 61415! 5! 


By a random assignment of laborers to the jobs, we mean that each of the N 
sample points has probability equal to 1/ N. If A denotes the event of interest and na 
the number of sample points in A, the sum of the probabilities of the sample points in 
Ais P(A) = n,(1/N) = n,/N. The number of sample points in A, na, is the number 
of ways of assigning laborers to the four jobs with the 4 members of the ethnic group 
all going to job 1. The remaining 16 laborers need to be assigned to the remaining 
jobs. Because there remain two openings for job 1, this can be done in 


_( 16 ү 16! 
Te 712455] = 2141515! 


Па 


P(A) = = = 0.0031. 


ways. It follows that 


Thus, if laborers are randomly assigned to jobs, the probability that the 4 members 
of the ethnic group all go to the undesirable job is very small. There is reason to doubt 
that the jobs were randomly assigned. L| 


In many situations the sample points are identified by an array of symbols in which 
the arrangement of symbols is unimportant. The sample points for the selection of 
applicants, Example 2.2, imply a selection of two applicants out of five. Each sample 
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DEFINITION 2.8 


THEOREM 2.4 


Proof 


point is identified as a pair of symbols, and the order of the symbols used to identify 
the sample points is irrelevant. 


The number of combinations of n objects taken r at a time is the number of 
subsets, each of size r, that can be formed from the n objects. This number will 
be denoted by C? or (^). 


The number of unordered subsets of size r chosen (without replacement) from 


n available objects is 
n "ELS n! 
= C = — = —— —_.,, 
7 r! r!(n — ғ)! 


The selection of r objects from a total of n is equivalent to partitioning the n 
objects into k — 2 groups, the r selected, and the (n — r) remaining. This is a 
special case of the general partitioning problem dealt with in Theorem 2.3. In 
the present case, k = 2, nı = r, and n» = (n — т) and, therefore, 


PE ыг. 
e DE 
r r n—r r!(n — т)! 


The terms (") are generally referred to as binomial coefficients because they occur 
in the binomial expansion 


n п n0 n n—l.l n n-2 n 0. n 
(x+y)" = xy + x y+ Y Peet xy 
0 1 2 n 


n/n 
= (Party. 
i=o M 


EXAMPLE 2.11 


Solution 


Find the number of ways of selecting two applicants out of five and hence the total 
number of sample points in 5 for Example 2.2. 


5 5! 
= =10. 
G) 213! 


(Notice that this agrees with the number of sample points listed in Example 2.2.) Ё 


EXAMPLE 2.12 


Solution 


Let A denote the event that exactly one of the two best applicants appears in a selection 
of two out of five. Find the number of sample points in A and P(A). 


Let n, denote the number of sample points in A. Then л„ equals the number of 
ways of selecting one of the two best (call this number т) times the number of 
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ways of selecting one of the three low-ranking applicants (call this number n). Then 
m= (5. n= (i): and applying the mn rule, 


2 3 2! 3! 
Na = . = —.— =6 
1 1 1H! 1!2! 


(This number can be verified by counting the sample points in A from the listing in 
Example 2.2.) 

In Example 2.11 we found the total number of sample points in 5 to be N = 10. 
If each selection is equiprobable, P(E;) = 1/10 = .1, i = 1, 2,..., 10, and 


P(A)= > P(E) = V (I) en C1) = 6(.1) = .6б. El 


ЕСА E;CA 


1 


EXAMPLE 2.13 


Solution 


А company orders supplies from M distributors and wishes to place n orders (п < M). 
Assume that the company places the orders in a manner that allows every distributor 
an equal chance of obtaining any one order and there is no restriction on the number 
of orders that can be placed with any distributor. Find the probability that a particular 
distributor—say, distributor I—gets exactly k orders (k < n). 


Because any of the M distributors can be selected to receive any one of the orders, 
there are M ways that each order can be placed, and the number of different ways 
that the n orders can be placed is М. М · M --- M = (M)". Consequently, there are 
(M)" sample points in S. All these points are equally likely; hence P(E;) = 1/(M)". 
Let A denote the event that distributor I receives exactly k orders from among the n. 
The k orders assigned to distributor I can be chosen from the n in [D ways. Itremains to 
determine the number of ways the remaining (7 — k) orders can be assigned to the other 
M — 1 distributors. Because each of these (n — К) orders can go to any of the (M — 1) 
distributors, this assignment can be made in (M — 1)-* ways. Thus, A contains 


TN (+) (М _ j4* 
Па = Ё 


sample points, and because the sample points are equally likely, 


1 1 n M — 1)"-* 
PLUS PER a) - (ов) = WX T " и 


ЕСА ЕСА 


Theorems 2.1 through 2.4 provide a few of the many useful counting rules found 
in the theory of combinatorial analysis. A few additional theorems appear in the 
exercises at the end of the chapter. If you are interested in extending your knowledge 
of combinatorial analysis, refer to one of the numerous texts on this subject. 

We will next direct our attention to the concept of conditional probability. Con- 
ditional probability plays an important role in the event-composition approach for 
finding the probability of an event and is sometimes useful in finding the probabilities 
of sample points (for sample spaces with sample points that are not equally likely). 
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Exercises 


An airline has six flights from New York to California and seven flights from California to 
Hawaii per day. If the flights are to be made on separate days, how many different flight 
arrangements can the airline offer from New York to Hawaii? 


An assembly operation in a manufacturing plant requires three steps that can be performed in 
any sequence. How many different ways can the assembly be performed? 


A businesswoman in Philadelphia is preparing an itinerary for a visit to six major cities. The 
distance traveled, and hence the cost of the trip, will depend on the order in which she plans 
her route. 


a How many different itineraries (and trip costs) are possible? 

b If the businesswoman randomly selects one of the possible itineraries and Denver and San 
Francisco are two of the cities that she plans to visit, what is the probability that she will 
visit Denver before San Francisco? 


An upscale restaurant offers a special fixe prix menu in which, for a fixed dinner cost, a diner can 
select from four appetizers, three salads, four entrees, and five desserts. How many different 
dinners are available if a dinner consists of one appetizer, one salad, one entree, and one 
dessert? 


An experiment consists of tossing a pair of dice. 


a Use the combinatorial theorems to determine the number of sample points in the sample 
space $. 


b Find the probability that the sum of the numbers appearing on the dice is equal to 7. 


A brand of automobile comes in five different styles, with four types of engines, with two types 
of transmissions, and in eight colors. 


a How many autos would a dealer have to stock if he included one for each style-engine- 
transmission combination? 

b How many would a distribution center have to carry if all colors of cars were stocked for 
each combination in part (a)? 


How many different seven-digit telephone numbers can be formed if the first digit cannot be 
zero? 


A personnel director for a corporation has hired ten new engineers. If three (distinctly different) 
positions are open at a Cleveland plant, in how many ways can she fill the positions? 


A fleet of nine taxis is to be dispatched to three airports in such a way that three go to airport 
A, five go to airport B, and one goes to airport C. In how many distinct ways can this be 
accomplished? 


Refer to Exercise 2.43. Assume that taxis are allocated to airports at random. 
a If exactly one of the taxis is in need of repair, what is the probability that it is dispatched 


to airport C? 


b If exactly three of the taxis are in need of repair, what is the probability that every airport 
receives one of the taxis requiring repairs? 


Suppose that we wish to expand (x + y + z)". What is the coefficient of x?y?z!?? 
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Ten teams are playing in a basketball tournament. In the first round, the teams are randomly 
assigned to games 1, 2, 3, 4 and 5. In how many ways can the teams be assigned to the games? 


Refer to Exercise 2.46. If 2n teams are to be assigned to games 1, 2, ..., п, in how many ways 
can the teams be assigned to the games? 


If we wish to expand (x + у), what is the coefficient of x?y?? What is the coefficient of 
xy? 


Students attending the University of Florida can select from 130 major areas of study. A 
student's major is identified in the registrar's records with a two-or three-letter code (for 
example, statistics majors are identified by STA, math majors by MS). Some students opt for 
а double major and complete the requirements for both of the major areas before graduation. 
The registrar was asked to consider assigning these double majors a distinct two- or three-letter 
code so that they could be identified through the student records’ system. 


a What is the maximum number of possible double majors available to University of Florida 
students? 

b If any two- or three-letter code is available to identify majors or double majors, how many 
major codes are available? 

c How many major codes are required to identify students who have either a single major or 
a double major? 

d Are there enough major codes available to identify all single and double majors at the 
University of Florida? 


Probability played a role in the rigging of the April 24, 1980, Pennsylvania state lottery (Los 
Angeles Times, September 8, 1980). To determine each digit of the three-digit winning number, 
each of the numbers 0, 1, 2, ..., 9 is placed on a Ping-Pong ball, the ten balls are blown into 
a compartment, and the number selected for the digit is the one on the ball that floats to the 
top of the machine. To alter the odds, the conspirators injected a liquid into all balls used in 
the game except those numbered 4 and 6, making it almost certain that the lighter balls would 
be selected and determine the digits in the winning number. Then they bought lottery tickets 
bearing the potential winning numbers. How many potential winning numbers were there (666 
was the eventual winner)? 


A local fraternity is conducting a raffle where 50 tickets are to be sold—one per customer. 
There are three prizes to be awarded. If the four organizers of the raffle each buy one ticket, 
what is the probability that the four organizers win 


all of the prizes? 
exactly two of the prizes? 


exactly one of the prizes? 


© с^ c» 


none of the prizes? 


An experimenter wishes to investigate the effect of three variables—pressure, temperature, 
and the type of catalyst—on the yield in a refining process. If the experimenter intends to 
use three settings each for temperature and pressure and two types of catalysts, how many 
experimental runs will have to be conducted if he wishes to run all possible combinations of 
pressure, temperature, and types of catalysts? 


Five firms, Fi, F5, ..., Fs, each offer bids on three separate contracts, C1, C2, and Сз. Any one 
firm will be awarded at most one contract. The contracts are quite different, so an assignment 


of С, to Еу, say, is to be distinguished from an assignment of С» to F}. 
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a How many sample points are there altogether in this experiment involving assignment of 
contracts to the firms? (No need to list them all.) 


b Under the assumption of equally likely sample points, find the probability that F3 is awarded 
a contract. 


A group of three undergraduate and five graduate students are available to fill certain stu- 
dent government posts. If four students are to be randomly selected from this group, find the 
probability that exactly two undergraduates will be among the four chosen. 


A study is to be conducted in a hospital to determine the attitudes of nurses toward various 
administrative procedures. A sample of 10 nurses is to be selected from a total of the 90 nurses 
employed by the hospital. 


a How many different samples of 10 nurses can be selected? 


b Twenty of the 90 nurses are male. If 10 nurses are randomly selected from those employed 
by the hospital, what is the probability that the sample of ten will include exactly 4 male 
(and 6 female) nurses? 


A student prepares for an exam by studying a list of ten problems. She can solve six of them. 
For the exam, the instructor selects five problems at random from the ten on the list given 
to the students. What is the probability that the student can solve all five problems on the 
exam? 


Two cards are drawn from a standard 52-card playing deck. What is the probability that the 
draw will yield an ace and a face card? 


Five cards are dealt from a standard 52-card deck. What is the probability that we draw 


а 3acesand2 kings? 
b a full house" (3 cards of one kind, 2 cards of another kind)? 


Five cards are dealt from a standard 52-card deck. What is the probability that we draw 


a lace, 1 two, 1 three, 1 four, and 1 five (this is one way to get a "straight")? 


b апу straight? 


Refer to Example 2.7. Suppose that we record the birthday for each of n randomly selected 
persons. 


a Give an expression for the probability that none share the same birthday. 


b What is the smallest value of п so that the probability is at least .5 that at least two people 
share a birthday? 


Suppose that we ask п randomly selected people whether they share your birthday. 


a Give an expression for the probability that no one shares your birthday (ignore leap years). 


b How many people do we need to select so that the probability is at least .5 that at least one 
shares your birthday? 


A manufacturer has nine distinct motors in stock, two of which came from a particular supplier. 
The motors must be divided among three production lines, with three motors going to each 
line. If the assignment of motors to lines is random, find the probability that both motors from 
the particular supplier are assigned to the first line. 


The eight-member Human Relations Advisory Board of Gainesville, Florida, considered 
the complaint of a woman who claimed discrimination, based on sex, on the part of a local 
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company. The board, composed of five women and three men, voted 5—3 in favor of the plaintiff, 
the five women voting in favor of the plaintiff, the three men against. The attorney representing 
the company appealed the board’s decision by claiming sex bias on the part of the board mem- 
bers. If there was no sex bias among the board members, it might be reasonable to conjecture 
that any group of five board members would be as likely to vote for the complainant as any 
other group of five. If this were the case, what is the probability that the vote would split along 
sex lines (five women for, three men against)? 


A balanced die is tossed six times, and the number on the uppermost face is recorded each 
time. What is the probability that the numbers recorded are 1, 2, 3, 4, 5, and 6 in any order? 


Refer to Exercise 2.64. Suppose that the die has been altered so that the faces are 1, 2, 3, 4, 5, 
and 5. If the die is tossed five times, what is the probability that the numbers recorded аге 1, 2, 
3, 4, and 5 in any order? 


Refer to Example 2.10. What is the probability that 


a anethnic group member is assigned to each type of job? 


b no ethnic group member is assigned to a type 4 job? 


Refer to Example 2.13. Suppose that the number of distributors is M — 10 and that there are 
n = 7 orders to be placed. What is the probability that 


a allof the orders go to different distributors? 
*b distributor I gets exactly two orders апа distributor II gets exactly three orders? 


*c distributors I, II, and III get exactly two, three, and one order(s), respectively? 
Show that, for any integer n > 1, 


(7) = 1. Interpret this result. 
(0) = 1. Interpret this result. 
(7) = (,",). Interpret this result. 


n—r. 


© C9 


е. 


у @ = 2". [Hint: Consider the binomial expansion of (x + y)" with x = y = I.] 
i 


() + G^) 


Consider the situation where n items are to be partitioned into k < n distinct subsets. The 


Prove that e = 


n 


multinomial coefficients ( " i) provide the number of distinct partitions where ту items 


ny m: 
аге in group 1, n2 are in group 2, ..., ng are in group k. Prove that the total number of distinct 
partitions equals k”. [Hint: Recall Exercise 2.68(d).] 


Conditional Probability 
and the Independence of Events 


The probability of an event will sometimes depend upon whether we know that other 
events have occurred. For example, Florida sport fishermen are vitally interested 
in the probability of rain. The probability of rain on a given day, ignoring the daily 
atmospheric conditions or any other events, is the fraction of days in which rain occurs 
over a long period of time. This is the unconditional probability of the event “rain on 
a given day.” Now suppose that we wish to consider the probability of rain tomorrow. 
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DEFINITION 2.9 


It has rained almost continuously for two days in succession, and a tropical storm 
is heading up the coast. We have extra information related to whether or not it rains 
tomorrow and are interested in the conditional probability that it will rain given this 
information. A Floridian would tell you that the conditional probability of rain (given 
that it has rained two preceding days and that a tropical storm is predicted) is much 
larger than the unconditional probability of rain. 

The unconditional probability of a 1 in the toss of one balanced die is 1/6. If we 
know that an odd number has fallen, the number on the die must be 1, 3, or 5 and 
the relative frequency of occurrence of a 1 is 1/3. The conditional probability of an 
event is the probability (relative frequency of occurrence) of the event given the fact 
that one or more events have already occurred. A careful perusal of this example will 
indicate the agreement of the following definition with the relative frequency concept 
of probability. 


The conditional probability of an event A, given that an event B has occurred, 
is equal to 
P(An B) 
PAB) = s 
P(B) 


provided P(B) > 0. [The symbol P(A|B) is read “probability of A given B.”] 


Further confirmation ofthe consistency of Definition 2.9 with the relative frequency 
concept of probability can be obtained from the following construction. Suppose that 
an experiment is repeated a large number, N, of times, resulting in both A and B, 
АПВ, пу! times; A and not B, AN B, n», times; B and not A, АП B, n5 times; and 
neither A nor B, A N B, n»; times. These results are contained in Table 2.1. 

Note that п + 7112 + мо + n2? = N. Then it follows that 


PO) ETE, рову TETTE, pepa — —, 
ni ni» 
nii ny 
P(B|A) x ——. —, . and P(An B) & £, 
nii + no N 


where is read approximately equal to. 
With these probabilities, it is easy to see that 


P(An B) Р(А п В) 
PB = апа PAS eos 


Hence, Definition 2.9 is consistent with the relative frequency concept of probability. 


Table 2.1 Table for events A and B 


A A 
B ny ni» ny nj 
B n» n», ni + 22 
пу + п пә no N 
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EXAMPLE 2.14 


Solution 


Suppose that a balanced die is tossed once. Use Definition 2.9 to find the probability 
of a 1, given that an odd number was obtained. 


Define these events: 


A: Observe a 1. 
B: Observe an odd number. 


We seek the probability of A given that the event B has occurred. The event AM B 
requires the observance of both a 1 and an odd number. In this instance, A C B, 
so АПВ = A and P(A П B) = P(A) = 1/6. Also, P(B) = 1/2 and, using 
Definition 2.9, 
Р(А П B) 1/6 1 

P(B) 1/2 3 


P(A|B) = 


Notice that this result is in complete agreement with our earlier intuitive evaluation 
of this probability. L| 


DEFINITION 2.10 


Suppose that probability of the occurrence of an event A is unaffected by the 
occurrence or nonoccurrence of event B. When this happens, we would be inclined 
to say that events A and B are independent. This event relationship is expressed by 
the following definition. 


Two events A and B are said to be independent if any one of the following holds: 
P(A|B) — P(A), 
P(B|A) — P(B), 
Р(А П В) = Р(А)Р(В). 


Otherwise, the events аге said to be dependent. 


The notion of independence as a probabilistic concept is in agreement with our ev- 
eryday usage of the word if we carefully consider the events in question. Most would 
agree that “smoking” and "contracting lung cancer" are not independent events and 
would intuitively feel that the probability of contracting lung cancer, given that a 
person smokes, is greater than the (unconditional) probability of contracting lung 
cancer. In contrast, the events “rain today” and “rain a month from today” may well 
be independent. 


EXAMPLE 2.15 


Consider the following events in the toss of a single die: 


A: Observe an odd number. 
B: Observe an even number. 
C: Observe a 1 or 2. 
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Solution 


a Are A and B independent events? 
b Are A and C independent events? 


a То decide whether A and B are independent, we must see whether they satisfy 
the conditions of Definition 2.10. In this example, P(A) — 1/2, P(B) — 1/2, 
and P(C) = 1/3. Because А П B = Ø, P(A|B) = 0, and it is clear that 
P(A|B) = P(A). Events A and В are dependent events. 

b Are A and C independent? Note that P(A|C) — 1/2 and, as before, P(A) — 
1/2. Therefore, P(A|C) = P(A), and A and C are independent. ш 


EXAMPLE 2.16 


Solution 


Three brands of coffee, X, Y, and Z, are to be ranked according to taste by a judge. 
Define the following events: 


A: Brand X is preferred to Y. 

B: Brand X 15 ranked best. 

C: Brand X is ranked second best. 
D: Brand X is ranked third best. 


If the judge actually has no taste preference and randomly assigns ranks to the 
brands, is event A independent of events B, C, and D? 


The six equally likely sample points for this experiment are given by 


Е: XYZ, Ез:ҮХ7, Es: ZXY, 
E XZY, EGYZX, E ZYX, 


where XY Z denotes that X is ranked best, Y is second best, and Z is last. 
Then A = (Ei, E», Es}, B = (Ei, E2}, C = (Es, Es], D = (Ед, Ec), and it 
follows that 


P(An B) 
P(A) = 1/2, P(A|B) = ——— — 1, P(A|C) = 1/2, P(A|D) = 0. 
P(B) 
Thus, events A and C are independent, but events A and B are dependent. Events A 
and D are also dependent. О 
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Exercises 


If two events, A and B, are such that P(A) = .5, P(B) = .3, and P(A П B) = .1, find the 
following: 


a P(A|B) 
b P(B|A) 
c P(A|AU B) 
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d P(A|An B) 
e Р(АГ B|AU B) 


Бог а certain population of employees, the percentage passing or failing a job competency exam, 
listed according to sex, were as shown in the accompanying table. That is, of all the people 
taking the exam, 24% were in the male-pass category, 16% were in the male-fail category, and 
so forth. An employee is to be selected randomly from this population. Let A be the event that 
the employee scores a passing grade on the exam and let M be the event that a male is selected. 


Sex 


Outcome Мае (М)  Female(F) Total 


Pass (A) 24 36 60 
Fail (A) 16 24 40 
Total 40 60 100 


а Ате the events A and M independent? 
b Are the events A and F independent? 


Gregor Mendel was a monk who, in 1865, suggested a theory of inheritance based on the 
science of genetics. He identified heterozygous individuals for flower color that had two alleles 
(one r — recessive white color allele and one R — dominant red color allele). When these 
individuals were mated, 3/4 of the offspring were observed to have red flowers, and 1/4 had 
white flowers. The following table summarizes this mating; each parent gives one of its alleles 
to form the gene of the offspring. 


Parent 2 


Parent 1 r R 


We assume that each parent is equally likely to give either of the two alleles and that, if either 
one or two of the alleles in a pair is dominant (R), the offspring will have red flowers. What is 
the probability that an offspring has 


a atleast one dominant allele? 

b atleast one recessive allele? 

C one recessive allele, given that the offspring has red flowers? 

One hundred adults were interviewed in a telephone survey. Of interest was their opinions 


regarding the loan burdens of college students and whether the respondent had a child currently 
in college. Their responses are summarized in the table below: 


Loan Burden 


Child in College Тоо High (A) About Right (В) Too Little (С) Total 


Yes (D) .20 .09 .01 .30 
No(£) 41 21 ‚08 70 
Total 61 30 09 1.00 
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Which of the following are independent events? 


a Aand D 
b Band D 
c Сапар 


Cards are dealt, one at a time, from a standard 52-card deck. 


a If the first 2 cards are both spades, what is the probability that the next 3 cards are also 
spades? 

b If the first 3 cards are all spades, what is the probability that the next 2 cards are also 
spades? 

C If the first 4 cards are all spades, what is the probability that the next card is also a spade? 


A survey of consumers in a particular community showed that 1096 were dissatisfied with 
plumbing jobs done in their homes. Half the complaints dealt with plumber A, who does 40% 
of the plumbing jobs in the town. Find the probability that a consumer will obtain 


a anunsatisfactory plumbing job, given that the plumber was A. 


b asatisfactory plumbing job, given that the plumber was A. 


A study of the posttreatment behavior of a large number of drug abusers suggests that the 
likelihood of conviction within a two-year period after treatment may depend upon the offenders 
education. The proportions of the total number of cases falling in four education-conviction 
categories are shown in the following table: 


Status within 2 Years 
after Treatment 


Education Convicted Not Convicted Total 
10 years or more .10 .30 40 
9 years or less 27 33 .60 


Total 37 .63 1.00 


Suppose that a single offender is selected from the treatment program. Define the events: 


A: The offender has 10 or more years of education. 
B: The offender is convicted within two years after completion of treatment. 


Find the following: 


P(A). 
P(B). 
P(An B). 
P(AU B). 
P(A). 
P(AU В). 
P(An B). 
P(A|B). 
P(B|A). 


оо —. 000 C $9 


In the definition of the independence of two events, you were given three equalities to check: 
P(A|B) = P(A) or P(B|A) = P(B)or P(ANB) = P(A)P(B).If any one of these equalities 
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Proof 
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holds, A and B are independent. Show that if any of these equalities hold, the other two also 
hold. 


If P(A) > 0, P(B) > О, and P(A) < P(A|B), show that P(B) < P(B|A). 


Suppose that A C B and that P(A) > O and P(B) > 0. Are A and B independent? Prove your 
answer. 


Suppose that A and В are mutually exclusive events, with P(A) > 0 and P(B) < 1. Are A 
and B independent? Prove your answer. 


Suppose that A С В апа that P(A) > Oand P(B) > 0. Show that P(B|A) = 1 and P(A|B) = 
P(A)/P(B). 


If A and B are mutually exclusive events and P(B) > 0, show that 
P(A) 


Two Laws of Probability 


The following two laws give the probabilities of unions and intersections of events. 
As such, they play an important role in the event-composition approach to the solution 
of probability problems. 


The Multiplicative Law of Probability The probability of the intersection of 
two events A and B is 


P(An В) = P(A)P(B|A) 
= P(B)P(A|B). 
If A and B are independent, then 
P(An B) = P(A)P(B). 


The multiplicative law follows directly from Definition 2.9, the definition of 
conditional probability. 


Notice that the multiplicative law can be extended to find the probability of the 
intersection of any number of events. Thus, twice applying Theorem 2.5, we obtain 


P(AN BNC) = P[((An В) ПС] = P(ANB)P(C|AN B) 
= P(A)P(B|A)P(C|A П В). 


The probability of the intersection of any number of, say, k events can be obtained in 
the same manner: 


P(A АП A3 П. ·- П Ag) = P(A1)P(A2|A1) PCA3] A1. П А») 
+ Р(АДА ПА 07 N Ag 1). 


The additive law of probability gives the probability of the union of two events. 
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THEOREM 2.6 The Additive Law of Probability The probability of the union of two events 
A and B is 


P(AU B) = P(A) + P(B) — Р(А П В). 
If A and В are mutually exclusive events, P(A N B) = 0 and 
P(AU В) = P(A) + Р(В). 
Proof The proof of the additive law can be followed by inspecting the Venn diagram 
in Figure 2.10. E a 
Notice that AU B = AU (А П B), where A and (A N B) are mutually 


exclusive events. Further, B = (АП В) U (AN B), where (AN B) and (AN B) 
are mutually exclusive events. Then, by Axiom 3, 


P(AU В) = P(A)+P(ANB) and Р(В) = P(An В) + P(An B). 


The equality given on the right implies that P(An B) = P(B) — Р(А П B). 
Substituting this expression for P(A N B) into the expression for P(A О B) 
given in the left-hand equation of the preceding pair, we obtain the desired 
result: 


P(AU В) = P(A) + P(B) — P(An B). 


The probability of the union of three events can be obtained by making use of 
Theorem 2.6. Observe that 


P(AUBUC)= P[AU (BU C)] 
= Р(А) + P(BUC) — P[AQn (BU C)] 
= P(A)+ P(B) + P(C) - Р(ВПС) – PAN B)U(An C)] 
= P(A) + P(B) + P(C) - P(BNC) – Р(АП B) - P(An C) 
+ Р(АПВПС) 
because (AN B)N(ANC)=ANBNC. 


Another useful result expressing the relationship between the probability of an 
event and its complement is immediately available from the axioms of probability. 


FIGURE 2.10 
Venn diagram for the 
union of Aand B 
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Proof 
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If A is an event, then 
P(A) = 1 — P(A). 


Observe that S = AU A. Because . A and A are mutually exclusive events, it 
follows that P(S) = P(A) + P(A). Therefore, P(A) + P(A) = 1 and the 
result follows. 


As we will see in Section 2.9, it is sometimes easier to calculate P(A) than to 
calculate P(A). In such cases, it is easier to find P(A) by the relationship P(A) = 
1 — P(A) than to find P(A) directly. 


Exercises 


ІА), A», and А; are three events and P(A, N A2) = P(A; П Аз) 4 0 but P(A П Аз) = 0, 
show that 


P (at least one A;) = P(A;) + Р(А») + P(A3) — 2P (A1 N Ad). 


If A and B are independent events, show that A and B are also independent. Are A and B 
independent? 


Suppose that A and В are two events such that P(A) = .8 and P(B) = .7. 


Is it possible that P(A П В) = .1? Why or why not? 
What is the smallest possible value for P(A П B)? 

Is it possible that P(A П В) = .77? Why or why not? 
What is the largest possible value for P(A П В)? 


оо c» 


Suppose that A and B are two events such that P(A) + P(B) > 1. 


a What is the smallest possible value for P(A П В)? 
b What is the largest possible value for P(A N B)? 


Suppose that A and B are two events such that P(A) = .6 and P(B) = .3. 


Is it possible that P(A П В) = .1? Why or why not? 
What is the smallest possible value for P(A П B)? 
Is it possible that P(A П В) = .7? Why or why not? 
What is the largest possible value for P(A N В)? 


ana Cc» 


Suppose that A and B are two events such that P(A) + P(B) < 1. 


a What is the smallest possible value for P(A П В)? 
b What is the largest possible value for P(A N B)? 


Suppose that there is a 1 in 50 chance of injury on a single skydiving attempt. 


a If we assume that the outcomes of different jumps are independent, what is the probability 
that a skydiver is injured if she jumps twice? 

b A friend claims if there is a 1 in 50 chance of injury on a single jump then there is a 100% 
chance of injury if a skydiver jumps 50 times. Is your friend correct? Why? 
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Can A an В be mutually exclusive if P(A) = .4 and P(B) = .7? If P(A) = .4and P(B) = .3? 
Why? 


A policy requiring all hospital employees to take lie detector tests may reduce losses due to theft, 
but some employees regard such tests as a violation of their rights. Past experience indicates 
that lie detectors have accuracy rates that vary from 92% to 99%.” To gain some insight into the 
risks that employees face when taking a lie detector test, suppose that the probability is .05 that 
a lie detector concludes that a person is lying who, in fact, is telling the truth and suppose 
that any pair of tests are independent. What is the probability that a machine will conclude 
that 


a each of three employees is lying when all are telling the truth? 


b atleast one of the three employees is lying when all are telling the truth? 


Two events A and B are such that P(A) = .2, P(B) = .3, and P(A U B) = 44. Find the 
following: 


P(An B) 
P(AUB) 
P(An B) 
d P(A|B) 


ос э» 


A smoke detector system uses two devices, А апа В. If smoke is present, the probability that 
it will be detected by device A is .95; by device B, .90; and by both devices, .88. 


a Ifsmoke is present, find the probability that the smoke will be detected by either device A 
or B or both devices. 


b Find the probability that the smoke will be undetected. 


In a game, a participant is given three attempts to hit a ball. On each try, she either scores a 
hit, H, ora miss, M. The game requires that the player must alternate which hand she uses in 
successive attempts. That is, if she makes her first attempt with her right hand, she must use 
her left hand for the second attempt and her right hand for the third. Her chance of scoring a 
hit with her right hand is .7 and with her left hand is .4. Assume that the results of successive 
attempts are independent and that she wins the game if she scores at least two hits in a row. 
If she makes her first attempt with her right hand, what is the probability that she wins the 
game? 


If A and B are independent events with P(A) = .5 and P(B) = .2, find the following: 


a P(AUB) 

b P(An B) 

c P(AUB) 

Consider the following portion of an electric circuit with three relays. Current will flow from 
point a to point b if there is at least one closed path when the relays are activated. The relays 
may malfunction and not close when activated. Suppose that the relays act independently of 
one another and close properly when activated, with a probability of .9. 

a What is the probability that current will flow when the relays are activated? 


b Given that current flowed when the relays were activated, what is the probability that relay 
1 functioned? 


2. Source: Copyright © 1980 Sentinel Communications Co. All rights reserved. 
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With relays operating as in Exercise 2.97, compare the probability of current flowing from a 
to b in the series system shown 


1—0 0O: 


with the probability of flow in the parallel system shown. 


Suppose that A and B are independent events such that the probability that neither occurs is a 


1-b- 
and the probability of B is b. Show that P(A) = TF 
Show that Theorem 2.6, the additive law of probability, holds for conditional probabilities. 
That is, if A, B, and C are events such that P(C) > 0, prove that P(A U B|C) = P(A|C) + 
Р(В|С)– Р(АПВ |С). [Hint: Make use of the distributive law (AUB)NC = (ANC)U(BNC).] 


Articles coming through an inspection line are visually inspected by two successive inspectors. 
When a defective article comes through the inspection line, the probability that it gets by the 
first inspector is .1. The second inspector will “miss” five out of ten of the defective items that 
get past the first inspector. What is the probability that a defective item gets by both inspectors? 


Diseases I and П are prevalent among people in a certain population. It is assumed that 10% of 
the population will contract disease I sometime during their lifetime, 15% will contract disease 
II eventually, and 3% will contract both diseases. 


a Find the probability that a randomly chosen person from this population will contract at 
least one disease. 


b Find the conditional probability that a randomly chosen person from this population will 
contract both diseases, given that he or she has contracted at least one disease. 


Refer to Exercise 2.50. Hours after the rigging of the Pennsylvania state lottery was announced, 
Connecticut state lottery officials were stunned to learn that their winning number for the day 
was 666 (Los Angeles Times, September 21, 1980). 


a Allevidence indicates that the Connecticut selection of 666 was due to pure chance. What 
is the probability that a 666 would be drawn in Connecticut, given that a 666 had been 
selected in the April 24, 1980, Pennsylvania lottery? 

b What is the probability of drawing a 666 in the April 24, 1980, Pennsylvania lottery 
(remember, this drawing was rigged) and a 666 in the September 19, 1980, Connecticut 
lottery? 
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2.104 
2.105 
2.106 
2.107 
2.108 


2.109 


2,9 


If A and B are two events, prove that P(AN B) > 1— P (A) — P (B). [Note: This is a simplified 
version of the Bonferroni inequality.] 


If the probability of injury on each individual parachute jump is .05, use the result in Exer- 
cise 2.104 to provide a lower bound for the probability of landing safely on both of two jumps. 


If A and B are equally likely events and we require that the probability of their intersection be 
at least .98, what is P(A)? 


Let A, B, and C be events such that P(A) > P(B) and P(C) > 0. Construct an example to 
demonstrate that it is possible that P(A|C) < P(B|C). 


If A, B, and C are three events, use two applications of the result in Exercise 2.104 to prove 
that P(AN BNC) > 1— P(A) – Р(В) – P(C). 


If A, B, and C are three equally likely events, what is the smallest value for P(A) such that 
P(AN ВПС) always exceeds 0.95? 


Calculating the Probability of an Event: 
The Event-Composition Method 


We learned in Section 2.4 that sets (events) can often be expressed as unions, intersec- 
tions, or complements of other sets. The event-composition method for calculating 
the probability of an event, A, expresses A as a composition involving unions and/or 
intersections of other events. The laws of probability are then applied to find P(A). 
We will illustrate this method with an example. 


EXAMPLE 2.17 


Solution 


Of the voters in a city, 4096 are Republicans and 6096 are Democrats. Among the 
Republicans 7046 are in favor of a bond issue, whereas 80% of the Democrats favor 
the issue. If a voter is selected at random in the city, what is the probability that he or 
she will favor the bond issue? 


Let F denote the event "favor the bond issue,” А the event “a Republican is selected,” 
and D the event “a Democrat is selected.” Then P(R) = .4, P(D) = .6, P(F|R) = 
.7, and P(F|D) = .8. Notice that 


Р(Е) = РКЕП К) О (ЕП р)| = Р(ЕП К) + Р(Е Пр) 
because (F N А) апа (F N D) are mutually exclusive events. Figure 2.11 will help 
you visualize the result that F = (F N R) U (F N D). Now 
P(F OR) = P(F|R)P(R) = (.7)(.4) = .28, 


P(F A D) = P(F|D)P(D) = (.8)(.6) = 48. 
It follows that ( ) (F|D) P(D) = C8)C6) 


P(F) = 28-4 .48 =.76. 


FIGURE 2.11 
Venn diagram 
for events of 
Example 2.17 
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EXAMPLE 2.18 


Solution 


In Example 2.7 we considered an experiment wherein the birthdays of 20 randomly 
selected persons were recorded. Under certain conditions we found that P(A) = 
.5886, where A denotes the event that each person has a different birthday. Let B 
denote the event that at least one pair of individuals share a birthday. Find P(B). 


The event B is the set of all sample points in S that are not in A, that is, В = A. 
Therefore, 


P(B) = 1 — P(A) = 1 —.5886 = 4114. 


(Most would agree that this probability is surprisingly high!) О 


Let us refer to Example 2.4, which involves the two tennis players, and let Dj 
and D» denote the events that player A wins the first and second games, respec- 
tively. The information given in the example implies that P(D,;) = P(D2) = 2/3. 
Further, if we make the assumption that Dı and D» are independent, it follows that 
РР N D?) = 2/3 x 2/3 = 4/9. In that example we identified the simple event Е}, 
which we denoted AA, as meaning that player A won both games. With the present 
notation, 


Ej = Dı A Ds, 


and thus P(E) = 4/9. The probabilities assigned to the other simple events in 
Example 2.4 can be verified in a similar manner. 

The event-composition approach will not be successful unless the probabilities of 
the events that appear in P(A) (after the additive and multiplicative laws have been 
applied) are known. If one or more of these probabilities is unknown, the method fails. 
Often it is desirable to form compositions of mutually exclusive or independent events. 
Mutually exclusive events simplify the use of the additive law and the multiplicative 
law of probability is easier to apply to independent events. 
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A summary of the steps used in the event-composition method follows: 


1. Define the experiment. 

Visualize the nature of the sample points. Identify a few to clarify your 
thinking. 

3. Write an equation expressing the event of interest—say, A—as a 
composition of two or more events, using unions, intersections, and/or 
complements. (Notice that this equates point sets.) Make certain that event 
A and the event implied by the composition represent the same set of 
sample points. 

4. Apply the additive and multiplicative laws of probability to the 
compositions obtained in step 3 to find P(A). 


Step 3 is the most difficult because we can form many compositions that will be 
equivalent to event A. The trick is to form a composition in which all the probabilities 
appearing in step 4 are known. 

The event-composition approach does not require listing the sample points in S, 
but it does require a clear understanding of the nature of a typical sample point. The 
major error students tend to make in applying the event-composition approach occurs 
in writing the composition. That is, the point-set equation that expresses A as union 
and/or intersection of other events is frequently incorrect. Always test your equality 
to make certain that the composition implies an event that contains the same set of 
sample points as those in A. 

A comparison of the sample-point and event-composition methods for calculating 
the probability of an event can be obtained by applying both methods to the same 
problem. We will apply the event-composition approach to the problem of selecting 
applicants that was solved by the sample-point method in Examples 2.11 and 2.12. 


EXAMPLE 2.19 


Solution 


Two applicants are randomly selected from among five who have applied for a job. 
Find the probability that exactly one of the two best applicants is selected, event A. 


Define the following two events: 


B: Draw the best and one of the three poorest applicants. 
C: Draw the second best and one of the three poorest applicants. 


Events B and C are mutually exclusive and A = BUC. Also, let Dj = Bj Bo, 
where 


B; = Draw the best on the first draw, 
By = Draw one of the three poorest applicants on the second draw, 


and D» = Вз N B4, where 


Вз = Draw one of the three poorest applicants on the first draw, 
B4 — Draw the best on the second draw. 


Note that B = ру U Do. 
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Similarly, let Су = Cy П С» and Gz = Сз N C4, where С, C2, Сз, and Са аге 
defined like В|, B5, Вз, and By, with the words second best replacing best. Notice 
that Dı and D» and G; and Gz are pairs of mutually exclusive events and that 


A = BUC = (D; UD U (G1 U G2), 
A = (B; N В») U (B3 n B4) U (С N С») U (Сз N Су). 


Applying the additive law of probability to these four mutually exclusive events, 
we have 


P(A) = Р(Ву П В) + P (B3 П B4) + P(C10 C2) + P (C3 N Сд). 
Applying the multiplicative law, we have 
P(Bi В) = P (B1) PCB2| Bi). 
The probability of drawing the best on the first draw is 


Р(В\) = 1/5. 


Similarly, the probability of drawing one of the three poorest on the second draw, 
given that the best was drawn on the first selection, is 


P(B5|Bi) = 3/4. 
Then 
P(B,O В) = P(Bi)P(B2| B1) = (1/5)(3/4) = 3/20. 


The probabilities of all other intersections in P(A), Р(Вз П B4), P(C40 C2), and 
Р(Сз П Сд) are obtained in exactly the same manner, and all equal 3/20. Then 
P(A) = P(B, П B5) + Р(Вз П B4) + Р(С ПС) + Р(Сз П Са) 
= (3/20) + (3/20) + (3/20) + (3/20) = 3/5. 


This answer is identical to that obtained in Example 2.12, where P(A) was calcu- 
lated by using the sample-point approach. 


EXAMPLE 2.20 


Solution 


It is known that a patient with a disease will respond to treatment with probability 
equal to .9. If three patients with the disease are treated and respond independently, 
find the probability that at least one will respond. 


Define the following events: 
A: At least one of the three patients will respond. 


Ву: The first patient will not respond. 
B2: The second patient will not respond. 
Вз: The third patient will not respond. 
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Then observe that A = B, N В› П Вз. Theorem 2.7 implies that 
P(A) = 1— P(A) 
= 1— P(B NB. Вз). 
Applying the multiplicative law, we have 
P(B, N B2 П Вз) = P(By) P(B2| Bi) P(B3|B, П B2), 
where, because the events are independent, 
P(B2|B,) = P(B2) = 0.1 and P(B3|B, O B5) = P(B3) = 0.1. 
Substituting P(B;) = .1, i = 1, 2, 3, we obtain 
P(A) = 1 — C1? = .999, 


Notice that we have demonstrated the utility of complementary events. This result 
is important because frequently it is easier to find the probability of the complement, 
P(A), than to find P(A) directly. П 


EXAMPLE 2.21 


Solution 


Observation of a waiting line at a medical clinic indicates the probability that a new 
arrival will be an emergency case is p = 1/6. Find the probability that the rth patient 
is the first emergency case. (Assume that conditions of arriving patients represent 
independent events.) 


The experiment consists of watching patient arrivals until the first emergency case 
appears. Then the sample points for the experiment are 


E;: The ith patient is the first emergency case, fori = 1, 2,.... 


Because only one sample point falls in the event of interest, 
P (rth patient is the first emergency case ) = P(E,). 


Now define A; to denote the event that the ith arrival is not an emergency case. 
Then we can represent E, as the intersection 


E, = А: ПА ПАП: ПА, A, 
Applying the multiplicative law, we have 
P(E,) = P(A,)P(A2|A1)P(A3|A1M A2) +++ PCA,|AL N + N А,_|), 
and because the events Д}, A>,..., A,—1, and A, аге independent, it follows that 
P(E,) = P(A)P(A5)-: P(A,-1) P(A,) = (1— p)! p 
= (5/6)! (1/6), r=1,2,3,.... 
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Notice that 
P(S) = P(Ej) + Р(Ез) + Р(Ез) +--+ P(E) + 
(1/6) + (5/6)(1/6) + (5/6)? (1/6) + --- + (5/6 (1/6) +- 


1A(s\ 16 
э =т=: 


ї=0 


This result follows from the formula for the sum of a geometric series given in 
Appendix A1.11. This formula, which states that if |r| < 1, Уг! = 7}, is use- 


1—г? 


ful in many simple probability problems. L| 


EXAMPLE 2.22 


Solution 


A monkey is to demonstrate that she recognizes colors by tossing one red, one black, 
and one white ball into boxes of the same respective colors, one ball to a box. If the 
monkey has not learned the colors and merely tosses one ball into each box at random, 
find the probabilities of the following results: 


a There are no color matches. 

b There is exactly one color match. 
This problem can be solved by listing sample points because only three balls are 
involved, but a more general method will be illustrated. Define the following events: 


Ау: А color match occurs in the red box. 
A»: А color match occurs in the black box. 
Аз: А color match occurs in the white box. 


There are 3! — 6 equally likely ways of randomly tossing the balls into the boxes 
with one ball in each box. Also, there are only 2! — 2 ways of tossing the balls into 
the boxes if one particular box is required to have a color match. Hence, 


P(A1) = P(A2) = P(A3) = 2/6 = 1/3. 
Similarly, it follows that 
Р(А П A?) = P(A, П Аз) = Р(А» N Аз) = Р(А П A2 Аз) = 1/6. 


We can now answer parts (a) апа (b) by using the event-composition method. 
a Notice that 


P (no color matches) = 1 — P(at least one color match) 
= 1 — P(A; U A3 U A3) 
= 1 — [P (A1) + P (A2) + P (A3) - P (A1 N A2) 
— Р(А N Аз) - P (A2 N Аз) + P (A1 N A2 N Аз)] 
= 1= [3(1/3) — 3(1/6) + (1/6)] = 2/6 = 1/3. 
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b We leave it to you to show that 
P (exactly one match) = P(A,) + P(A2) + P(A3) 
=2[P (Ai N A2) + Р(А N Аз) + P CAS N Аз)] 
+ 3[P(A1M A2M Аз)] 
= (3)(1/3) — (2)(3)(1/6) + (3)(1/6) = 1/2. П 
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The best way to learn how to solve probability problems is to learn by doing. To 
assist you in developing your skills, many exercises are provided at the end of this 
section, at the end of the chapter, and in the references. 


Exercises 


Of the items produced daily by a factory, 40% come from line I and 60% from line IL. Line I 
has a defect rate of 8%, whereas line II has a defect rate of 10%. If an item is chosen at random 
from the day’s production, find the probability that it will not be defective. 


An advertising agency notices that approximately 1 in 50 potential buyers of a product sees 
a given magazine ad, and 1 in 5 sees a corresponding ad on television. One in 100 sees both. 
One in 3 actually purchases the product after seeing the ad, 1 in 10 without seeing it. What is 
the probability that a randomly selected potential customer will purchase the product? 


Three radar sets, operating independently, are set to detect any aircraft flying through a certain 
area. Each set has a probability of .02 of failing to detect a plane in its area. If an aircraft enters 
the area, what is the probability that it 


a goes undetected? 
b is detected by all three radar sets? 


Consider one of the radar sets of Exercise 2.112. What is the probability that it will correctly 
detect exactly three aircraft before it fails to detect one, if aircraft arrivals are independent 
single events occurring at different times? 


A lie detector will show a positive reading (indicate a lie) 1096 of the time when a person is 
telling the truth and 9596 of the time when the person is lying. Suppose two people are suspects 
in a one-person crime and (for certain) one is guilty and will lie. Assume further that the lie 
detector operates independently for the truthful person and the liar. What is the probability that 
the detector 


a shows a positive reading for both suspects? 

b shows a positive reading for the guilty suspect and a negative reading for the innocent 
suspect? 

c is completely wrong—that is, that it gives a positive reading for the innocent suspect and 
a negative reading for the guilty? 


d gives a positive reading for either or both of the two suspects? 
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A state auto-inspection station has two inspection teams. Team 1 is lenient and passes all 
automobiles of a recent vintage; team 2 rejects all autos on a first inspection because their 
“headlights are not properly adjusted.” Four unsuspecting drivers take their autos to the station 
for inspection on four different days and randomly select one of the two teams. 


a Ifall four cars are new and in excellent condition, what is the probability that three of the 
four will be rejected? 


b What is the probability that all four will pass? 


A communications network has a built-in safeguard system against failures. In this system if 
line I fails, it is bypassed and line II is used. If line II also fails, it is bypassed and line III is 
used. The probability of failure of any one of these three lines is .01, and the failures of these 
lines are independent events. What is the probability that this system of three lines does not 
completely fail? 


A football team has a probability of .75 of winning when playing any of the other four teams 
in its conference. If the games are independent, what is the probability the team wins all its 
conference games? 


An accident victim will die unless in the next 10 minutes he receives some type A, Rh-positive 
blood, which can be supplied by a single donor. The hospital requires 2 minutes to type a 
prospective donor’s blood and 2 minutes to complete the transfer of blood. Many untyped 
donors are available, and 40% of them have type A, Rh-positive blood. What is the probability 
that the accident victim will be saved if only one blood-typing kit is available? Assume that 
the typing kit is reusable but can process only one donor at a time. 


Suppose that two balanced dice are tossed repeatedly and the sum of the two uppermost faces 
is determined on each toss. What is the probability that we obtain 


a asum of 3 before we obtain a sum of 7? 


b asum of 4 before we obtain a sum of 7? 


Suppose that two defective refrigerators have been included in a shipment of six refrigerators. 
The buyer begins to test the six refrigerators one at a time. 


a What is the probability that the last defective refrigerator is found on the fourth test? 


b What is the probability that no more than four refrigerators need to be tested to locate both 
of the defective refrigerators? 

с When given that exactly one of the two defective refrigerators has been located in the first 
two tests, what is the probability that the remaining defective refrigerator is found in the 
third or fourth test? 


A new secretary has been given п computer passwords, only one of which will permit access 
to a computer file. Because the secretary has no idea which password is correct, he chooses 
one of the passwords at random and tries it. If the password is incorrect, he discards it and 
randomly selects another password from among those remaining, proceeding in this manner 
until he finds the correct password. 


a Whatis the probability that he obtains the correct password on the first try? 
b Whatisthe probability that he obtains the correct password on the second try? The third try? 


A security system has been set up so that if three incorrect passwords are tried before 
the correct one, the computer file is locked and access to it denied. If n = 7, what is the 
probability that the secretary will gain access to the file? 
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2.10 The Law of Total Probability 
and Bayes' Rule 


The event-composition approach to solving probability problems is sometimes facil- 
itated by viewing the sample space, S, as a union of mutually exclusive subsets and 
using the following law of total probability. The results of this section are based on 
the following construction. 


DEFINITION 2.11 For some positive integer k, let the sets B1, B5, ..., By be such that 


1. 5= BiU B;U---U B;. 
2. B; 1 Bj = #0, fori 5 j. 


Then the collection of sets { В, B5, ..., В} is said to be a partition of S. 


If Ais any subset of S and { B1, B2, ..., Ву) is a partition of S, A can be decomposed 
as follows: 


A = (АП В) О(АП B) U---U(AN By). 


Figure 2.12 illustrates this decomposition for k — 3. 


THEOREM 2.8 Assume that ( B1, B», ..., By) isa partition of 5 (see Definition 2.11) such that 
P(B;) > 0, fori = 1, 2,..., Kk. Then for any event A 


k 
P(A) = }) P(AIB)P()). 


i=l 


Proof Any subset A of S сап be written as 
А -—ADnS-—AnRn(BQ(UB;U---U Ву) 
= (An В) U(AN B5) U---U(An B;). 
Notice that, because { Ві, Bo,---, Bg} is a partition of S, if i Z j, 
(An B))n(An В) = АП (В; П Bj)) = А00 =й 
and that (А N B;) and (A N B;) are mutually exclusive events. Thus, 
P(A) == Р(А Bi) + P(A MB) > += Р(А By) 
= P(A|Bi) P(By) + P(A|B)) PCI») +--+ + P(A| By) P( By) 
k 
= Y | P(A|B;) P(B)). 


i=l 


In the examples and exercises that follow, you will see that it is sometimes much 
easier to calculate the conditional probabilities P (A|B;) for suitably chosen B; than it 
is to compute P (A) directly. In such cases, the law of total probability can be applied 
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FIGURE 2.12 
Decomposition of 
event A 


to determine P(A). Using the result of Theorem 2.8, it is a simple matter to derive 
the result known as Bayes’ rule. 


THEOREM 2.9 Bayes’ Rule Assume that (B4, B5, ..., Ву} is a partition of 5 (see Definition 
2.11) such that P(B;) > 0, fori = 1,2,...,k. Then 
P(A|B;)P(B;) 


k 
EIE m 


P(B;|A) = 


Proof The proof follows directly from the definition of conditional probability and 
the law of total probability. Note that 
P(AnB; _ P(A|Bj))P(Bj) 
EE | 


Р(ВДА) = - 
È Р(АІВ)Р(В) 


EXAMPLE 2.23 An electronic fuse is produced by five production lines in a manufacturing operation. 
The fuses are costly, are quite reliable, and are shipped to suppliers in 100-unit lots. 
Because testing is destructive, most buyers of the fuses test only a small number of 
fuses before deciding to accept or reject lots of incoming fuses. 

АП five production lines produce fuses at the same rate and normally produce 
only 2% defective fuses, which are dispersed randomly in the output. Unfortunately, 
production line 1 suffered mechanical difficulty and produced 5% defectives during 
the month of March. This situation became known to the manufacturer after the fuses 
had been shipped. A customer received a lot produced in March and tested three fuses. 
One failed. What is the probability that the lot was produced on line 1? What is the 
probability that the lot came from one of the four other lines? 


Solution Let B denote the event that a fuse was drawn from line 1 and let A denote the event 
that a fuse was defective. Then it follows directly that 


P(B)=0.2 and P(A|B) = 3(.05)(.95)* = .135375. 
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FIGURE 2.13 
Tree diagram for 
calculations in 
Example 2.23. ~ A 
and ^ B are 
alternative notations 
for A and B, 
respectively. 


A 00271 
p 


B 


~A 0.1729 


P(B|A) = 0.0271 / (0.0271 + 0.0461) = 0.3700 


p 


-B 


094 


Р(В) = 0.8 and P(A|B) = 3(.02)(.98)2 = .057624. 


0 А 0.0461 
“8009 


Similarly, 


Note that these conditional probabilities were very easy to calculate. Using the law 
of total probability, 


P(A) = Р(А|В)Р(В) + Р(А|В)Р(В) 
= (.135375)(.2) + (.057624)(.8) = .0731742. 


Finally, 
P(BO А) _ P(A|B)P(B) _ (.135375)(.2) _ а 
P(A) P(A) ~ 0731742 "7" 


Р(В|А) = 
апа 
Р(В|А) = 1 — P(B|A) = 1 – .37 = .63. 


Figure 2.13, obtained using the applet Bayes’ Rule as a Tree, illustrates the various 
steps in the computation of P(B|A) . E 


2.122 
2.123 


Exercises 


Applet Exercise Use the applet Bayes' Rule as a Tree to obtain the results given in Figure 2.13. 


Applet Exercise Refer to Exercise 2.122 and Example 2.23. Suppose that lines 2 through 
5 remained the same, but line 1 was partially repaired and produced a smaller percentage 
of defects. 


2.124 


2.125 


2.126 


2.127 
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a What impact would this have on P(A|B)? 

b Suppose that P(A|B) decreased to .12 and all other probabilities remained unchanged. Use 
the applet Bayes' Rule as a Tree to re-evaluate P(B|A). 

c How does the answer you obtained in part (b) compare to that obtained in Exercise 2.122? 
Are you surprised by this result? 


d Assume that all probabilities remain the same except P(A|B). Use the applet and trial and 
error to find the value of P(A|B) for which P(B|A) — .3000. 

e Ifline 1 produces only defective items but all other probabilities remain unchanged, what 
is P(B|A)? 

f A friend expected the answer to part (e) to be 1. Explain why, under the conditions of part 
(e), Р(В|А) + 1. 


A population of voters contains 40% Republicans and 60% Democrats. It is reported that 
30% of the Republicans and 70% of the Democrats favor an election issue. A person chosen 
at random from this population is found to favor the issue in question. Find the conditional 
probability that this person is a Democrat. 


A diagnostic test for a disease is such that it (correctly) detects the disease in 90% of the 
individuals who actually have the disease. Also, if a person does not have the disease, the test 
will report that he or she does not have it with probability .9. Only 1% of the population has the 
disease in question. If a person is chosen at random from the population and the diagnostic test 
indicates that she has the disease, what is the conditional probability that she does, in fact, have 
the disease? Are you surprised by the answer? Would you call this diagnostic test reliable? 


Applet Exercise Refer to Exercise 2.125. The probability that the test detects the disease given 
that the patient has the disease is called the sensitivity of the test. The specificity of the test is the 
probability that the test indicates no disease given that the patient is disease free. The positive 
predictive value of the test is the probability that the patient has the disease given that the test 
indicates that the disease is present. In Exercise 2.125, the disease in question was relatively 
rare, occurring with probability .01, and the test described has sensitivity = specificity = .90 
and positive predictive value — .0833. 


a Inaneffortto increase the positive predictive value of the test, the sensitivity was increased 
to .95 and the specificity remained at .90, what is the positive predictive value of the 
"improved" test? 

b Still not satisfied with the positive predictive value of the procedure, the sensitivity of the 
test is increased to .999. What is the positive predictive value of the (now twice) modified 
test if the specificity stays at .90? 

c Look carefully at the various numbers that were used to compute the positive predictive 
value of the tests. Why are all of the positive predictive values so small? [Hint: Compare 
the size of the numerator and the denominator used in the fraction that yields the value of 
the positive predictive value. Why is the denominator so (relatively) large?] 

d The proportion of individuals with the disease is not subject to our control. If the sensitivity 
of the test is .90, is it possible that the positive predictive value of the test can be increased 
to a value above .5? How? [Hint: Consider improving the specificity of the test.] 

e Basedon the results of your calculations in the previous parts, if the disease in question 
is relatively rare, how can the positive predictive value of a diagnostic test be significantly 
increased? 


Applet Exercise Refer to Exercises 2.125 and 2.126. Suppose now that the disease is not 
particularly rare and occurs with probability .4 . 
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a If, as in Exercise 2.125, a test has sensitivity = specificity = .90, what is the positive 
predictive value of the test? 

b Why is the value of the positive predictive value of the test so much higher that the value 
obtained in Exercise 2.125? [Hint: Compare the size of the numerator and the denominator 
used in the fraction that yields the value of the positive predictive value.] 

C Ifthe specificity of the test remains .90, can the sensitivity of the test be adjusted to obtain 
a positive predictive value above .87? 

d Ifthe sensitivity remains at .90, can the specificity be adjusted to obtain a positive predictive 
value above .95? How? 

e The developers of a diagnostic test want the test to have a high positive predictive value. 
Based on your calculations in previous parts of this problem and in Exercise 2.126, is the 
value of the specificity more or less critical when developing a test for a rarer disease? 


A plane is missing and is presumed to have equal probability of going down in any of three 
regions. If a plane is actually down in region i, let 1 — o; denote the probability that the plane 
will be found upon a search of the ith region, i = 1, 2, 3. What is the conditional probability 
that the plane is in 


a region 1, given that the search of region 1 was unsuccessful? 
b region 2, given that the search of region 1 was unsuccessful? 


C region 3, given that the search of region 1 was unsuccessful? 


Males and females are observed to react differently to a given set of circumstances. It has 
been observed that 70% of the females react positively to these circumstances, whereas only 
40% of males react positively. A group of 20 people, 15 female and 5 male, was subjected 
to these circumstances, and the subjects were asked to describe their reactions on a written 
questionnaire. А response picked at random from the 20 was negative. What is the probability 
that it was that of a male? 


A study of Georgia residents suggests that those who worked in shipyards during World War II 
were subjected to a significantly higher risk of lung cancer (Wall Street Journal, September 21, 
1978)? It was found that approximately 22% of those persons who had lung cancer worked at 
some prior time in a shipyard. In contrast, only 14% of those who had no lung cancer worked 
at some prior time in a shipyard. Suppose that the proportion of all Georgians living during 
World War II who have or will have contracted lung cancer is .04%. Find the percentage of 
Georgians living during the same period who will contract (or have contracted) lung cancer, 
given that they have at some prior time worked in a shipyard. 


The symmetric difference between two events A and B is the set of all sample points that are 
in exactly one of the sets and is often denoted A A B. Note that A A В = (AN В) О (АП B). 
Prove that P(A A B) = P(A) + P(B) – 2Р(А П B). 


Use Theorem 2.8, the law of total probability, to prove the following: 


a If P(AJB) = P(A|B), then A and B are independent. 
b If P(A|C) > P(B|C) and Р(А|С) > P(B|C), then P(A) > P(B). 


A student answers a multiple-choice examination question that offers four possible answers. 
Suppose the probability that the student knows the answer to the question is .8 and the prob- 
ability that the student will guess is .2. Assume that if the student guesses, the probability of 


3. Source: Wall Street Journal, © Dow Jones & Company, Inc. 1981. АП rights reserved worldwide. 
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selecting the correct answer is .25. If the student correctly answers a question, what is the 
probability that the student really knew the correct answer? 


Two methods, A and B, are available for teaching a certain industrial skill. The failure rate is 
20% for A and 10% for B. However, B is more expensive and hence is used only 30% of the 
time. (A is used the other 7096.) A worker was taught the skill by one of the methods but failed 
to learn it correctly. What is the probability that she was taught by method A? 


Of the travelers arriving at a small airport, 60% fly on major airlines, 30% fly on privately 
owned planes, and the remainder fly on commercially owned planes not belonging to a major 
airline. Of those traveling on major airlines, 50% are traveling for business reasons, whereas 
60% of those arriving on private planes and 90% of those arriving on other commercially owned 
planes are traveling for business reasons. Suppose that we randomly select one person arriving 
at this airport. What is the probability that the person 


a is traveling on business? 
b istraveling for business on a privately owned plane? 
с arrived on a privately owned plane, given that the person is traveling for business reasons? 


d is traveling on business, given that the person is flying on a commercially owned plane? 


A personnel director has two lists of applicants for jobs. List 1 contains the names of five 
women and two men, whereas list 2 contains the names of two women and six men. А name is 
randomly selected from list 1 and added to list 2. A name is then randomly selected from the 
augmented list 2. Given that the name selected is that of a man, what is the probability that a 
woman's name was originally selected from list 1? 


Five identical bowls are labeled 1, 2, 3, 4, and 5. Bowl i contains i white and 5 — i black 
balls, with i = 1, 2, ..., 5. A bowl is randomly selected and two balls are randomly selected 
(without replacement) from the contents of the bowl. 


a What is the probability that both balls selected are white? 
b Given that both balls selected are white, what is the probability that bowl 3 was selected? 


Following is a description of the game of craps. A player rolls two dice and computes the total 
of the spots showing. If the player's first toss is a 7 or an 11, the player wins the game. If the 
first toss is a 2, 3, or 12, the player loses the game. If the player rolls anything else (4, 5, 6, 8, 9 
or 10) on the first toss, that value becomes the player's point. If the player does not win or lose 
on the first toss, he tosses the dice repeatedly until he obtains either his point or a 7. He wins 
if he tosses his point before tossing a 7 and loses if he tosses a 7 before his point. What is the 
probability that the player wins a game of craps? [Hint: Recall Exercise 2.119.] 


Numerical Events and Random Variables 


Events of major interest to the scientist, engineer, or businessperson are those identi- 
fied by numbers, called numerical events. The research physician is interested in the 
event that ten of ten treated patients survive an illness; the businessperson is inter- 
ested in the event that sales next year will reach $5 million. Let Y denote a variable 
to be measured in an experiment. Because the value of Y will vary depending on the 
outcome of the experiment, it is called a random variable. 

To each point in the sample space we will assign a real number denoting the value 
of the variable Y . The value assigned to Y will vary from one sample point to another, 
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but some points may be assigned the same numerical value. Thus, we have defined 
a variable that is a function of the sample points in S, and {all sample points where 
Y = a} is the numerical event assigned the number a. Indeed, the sample space $ can 
be partitioned into subsets so that points within a subset are all assigned the same value 
of Y. These subsets are mutually exclusive since no point is assigned two different 
numerical values. The partitioning of S is symbolically indicated in Figure 2.14 for a 
random variable that can assume values 0, 1, 2, 3, and 4. 


A random variable is a real-valued function for which the domain is a sample 
space. 


EXAMPLE 2.24 


Solution 


Define an experiment as tossing two coins and observing the results. Let Y equal the 
number of heads obtained. Identify the sample points in S, assign a value of Y to 
each sample point, and identify the sample points associated with each value of the 
random variable У. 


Let H and T represent head and tail, respectively; and let an ordered pair of symbols 
identify the outcome for the first and second coins. (Thus, H T implies a head on the 
first coin and a tail on the second.) Then the four sample points in 5 are Ei: HH, E»: 
HT, Ез: TH and Ел: TT. The values of Y assigned to the sample points depend 
on the number of heads associated with each point. For Е: HH, two heads were 
observed, and Е} is assigned the value Y = 2. Similarly, we assign the values У = 1 
to E» and Ез and Y = 0 to E4. Summarizing, the random variable Y can take three 
values, Y = 0, 1, and 2, which are events defined by specific collections of sample 
points: 


{У = 0} = {£4}, {Y = 1} = (E, Ез}, {У = 2} = {£1}. a 


Let y denote an observed value of the random variable Y. Then P(Y = y) is the 
sum of the probabilities of the sample points that are assigned the value y. 
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EXAMPLE 2.25 


Solution 


Compute the probabilities for each value of Y in Example 2.24. 


The event (Y = 0} results only from sample point Ед. If the coins are balanced, the 
sample points are equally likely; hence, 


P(Y = 0) = P(E4) = 1/4. 
Similarly, 


P(Y = 1) = Р(Е›) + Р(Ез) = 1/2 and P(Y = 2) = P(E) = 1/4. M 
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A more detailed examination of random variables will be undertaken in the next 
two chapters. 


Exercises 


Refer to Exercise 2.112. Let the random variable Y represent the number of radar sets that 
detect a particular aircraft. Compute the probabilities associated with each value of У. 


Refer to Exercise 2.120. Let the random variable Y represent the number of defective refrig- 
erators found after three refrigerators have been tested. Compute the probabilities for each 
value of У. 


Refer again to Exercise 2.120. Let the random variable Y represent the number of the test in 
which the last defective refrigerator is identified. Compute the probabilities for each value of Y . 


A spinner can land in any of four positions, A, B, C, and D, with equal probability. The 
spinner is used twice, and the position is noted each time. Let the random variable Y denote 
the number of positions on which the spinner did not land. Compute the probabilities for each 
value of У. 


Random Sampling 


As our final topic in this chapter, we move from theory to application and examine 
the nature of experiments conducted in statistics. A statistical experiment involves the 
observation of a sample selected from a larger body of data, existing or conceptual, 
called a population. The measurements in the sample, viewed as observations of the 
values of one or more random variables, are then employed to make an inference 
about the characteristics of the target population. 

How are these inferences made? An exact answer to this question is deferred until 
later, but a general observation follows from our discussion in Section 2.2. There we 
learned that the probability of the observed sample plays a major role in making an 
inference and evaluating the credibility of the inference. 

Without belaboring the point, it is clear that the method of sampling will affect 
the probability of a particular sample outcome. For example, suppose that a fictitious 
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population contains only N — 5 elements, from which we plan to take a sample of 
size n — 2. You could mix the elements thoroughly and select two in such a way that 
all pairs of elements possess an equal probability of selection. A second sampling 
procedure might require selecting a single element, replacing it in the population, and 
then drawing a single element again. The two methods of sample selection are called 
sampling without and with replacement, respectively. 

If all the N — 5 population elements are distinctly different, the probability of 
drawing a specific pair, when sampling without replacement, is 1/10. The probability 
of drawing the same specific pair, when sampling with replacement, is 2/25. You can 
easily verify these results. 

The point that we make is that the method of sampling, known as the design of an 
experiment, affects both the quantity of information in a sample and the probability of 
observing a specific sample result. Hence, every sampling procedure must be clearly 
described if we wish to make valid inferences from sample to population. 

The study of the design of experiments, the various types of designs along with their 
properties, is a course in itself. Hence, at this early stage of study we introduce only the 
simplest sampling procedure, simple random sampling. The notion of simple random 
sampling will be needed in subsequent discussions of the probabilities associated with 
random variables, and it will inject some realism into our discussion of statistics. This 
is because simple random sampling is often employed in practice. Now let us define 
the term random sample. 


Let N and n represent the numbers of elements in the population and sample, 
respectively. If the sampling is conducted in such a way that each of the @ ) 
samples has an equal probability of being selected, the sampling is said to be 
random, and the result is said to be a random sample. 


Perfect random sampling is difficult to achieve in practice. If the population is not 
too large, we might write each of the N numbers on a poker chip, mix all the chips, 
and select a sample of п chips. The numbers on the poker chips would specify the 
measurements to appear in the sample. 

Tables of random numbers have been formed by computer to expedite the selection 
of random samples. An example of such a table is Table 12, Appendix 3. A random 
number table is a set of integers (0, 1, ..., 9) generated so that, in the long run, the 
table will contain all ten integers in approximately equal proportions, with no trends 
in the patterns in which the digits were generated. Thus, if one digit is selected from 
a random point on the table, it is equally likely to be any of the digits 0 through 9. 

Choosing numbers from the table is analogous to drawing numbered poker chips 
from the mixed pile, as mentioned earlier. Suppose we want a random sample of 
three persons to be selected from a population of seven persons. We could number 
the people from 1 to 7, put the numbers on chips, thoroughly mix the chips, and then 
draw three out. Analogously, we could drop a pencil point on a random starting point 
in Table 12, Appendix 3. Suppose the point falls on the 15th line of column 9 and we 
decide to use the rightmost digit of the group of five, which is a 5 in this case. This 
process is like drawing the chip numbered 5. We may now proceed in any direction to 
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obtain the remaining numbers in the sample. If we decide to proceed down the page, 
the next number (immediately below the 5) is a 2. So our second sampled person 
would be number 2. Proceeding, we next come to an 8, but there are only seven 
elements in the population. Thus, the 8 is ignored, and we continue down the column. 
Two more 5s then appear, but they must both be ignored because person 5 has already 
been selected. (The chip numbered 5 has been removed from the pile.) Finally, we 
come to a 1, and our sample of three is completed with persons numbered 5, 2, and 1. 

Any starting point can be used in a random number table, and we may proceed in 
any direction from the starting point. However, if more than one sample is to be used 
in any problem, each should have a unique starting point. 

In many situations the population is conceptual, as in an observation made during 
a laboratory experiment. Here the population is envisioned to be the infinitely many 
measurements that would be obtained if the experiment were to be repeated over and 
over again. If we wish a sample of n — 10 measurements from this population, we 
repeat the experiment ten times and hope that the results represent, to a reasonable 
degree of approximation, a random sample. 

Although the primary purpose of this discussion was to clarify the meaning of a 
random sample, we would like to mention that some sampling techniques are only 
partially random. For instance, if we wish to determine the voting preference of the 
nation in a presidential election, we would not likely choose a random sample from 
the population of voters. By pure chance, all the voters appearing in the sample 
might be drawn from a single city—say, San Francisco—which might not be at 
all representative of the population of all voters in the United States. We would 
prefer a random selection of voters from smaller political districts, perhaps states, 
allotting a specified number to each state. The information from the randomly selected 
subsamples drawn from the respective states would be combined to form a prediction 
concerning the entire population of voters in the country. In general, we want to select 
a sample so as to obtain a specified quantity of information at minimum cost. 


Summary 


This chapter has been concerned with providing a model for the repetition of an 
experiment and, consequently, a model for the population frequency distributions of 
Chapter 1. The acquisition of a probability distribution is the first step in forming a 
theory to model reality and to develop the machinery for making inferences. 

An experiment was defined as the process of making an observation. The concepts 
of an event, a simple event, the sample space, and the probability axioms have provided 
a probabilistic model for calculating the probability of an event. Numerical events 
and the definition of a random variable were introduced in Section 2.11. 

Inherent in the model is the sample-point approach for calculating the probability 
of an event (Section 2.5). Counting rules useful in applying the sample-point method 
were discussed in Section 2.6. The concept of conditional probability, the operations 
of set algebra, and the laws of probability set the stage for the event-composition 
method for calculating the probability of an event (Section 2.9). 

Of what value is the theory of probability? It provides the theory and the tools 
for calculating the probabilities of numerical events and hence the probability 


80 Chapter 2 Probability 


2.143 


2.144 


2.145 


2.146 


2.147 


distributions for the random variables that will be discussed in Chapter 3. The nu- 
merical events of interest to us appear in a sample, and we will wish to calculate the 
probability of an observed sample to make an inference about the target population. 
Probability provides both the foundation and the tools for statistical inference, the 
objective of statistics. 
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Supplementary Exercises 


Show that Theorem 2.7 holds for conditional probabilities. That is, if P(B) > 0, then 
P(A|B) = 1 — Р(А|В). 


Let 5 contain four sample points, E1, E», E3, and Ец. 


a List all possible events in 5 (include the null event). 

b In Exercise 2.68(d), you showed that ? 7 , (7) = 2". Use this result to give the total number 
of events in S. 

с Let A and В be the events ( Ej, E2, E3} and { E2, E4}, respectively. Give the sample points 
in the following events: AU B, AN B, AN B, and AU B. 


A patient receiving a yearly physical examination must have 18 checks or tests performed. The 
sequence in which the tests are conducted is important because the time lost between tests will 
vary depending on the sequence. If an efficiency expert were to study the sequences to find the 
one that required the minimum length of time, how many sequences would be included in her 
study if all possible sequences were admissible? 


Five cards are drawn from a standard 52-card playing deck. What is the probability that all 5 
cards will be of the same suit? 


Refer to Exercise 2.146. A gambler has been dealt five cards: two aces, one king, one five, and 
one 9. He discards the 5 and the 9 and is dealt two more cards. What is the probability that he 
ends up with a full house? 
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A bin contains three components from supplier A, four from supplier B, and five from supplier 
C. If four of the components are randomly selected for testing, what is the probability that each 
supplier would have at least one component tested? 


A large group of people is to be checked for two common symptoms of a certain disease. It 
is thought that 20% of the people possess symptom A alone, 30% possess symptom B alone, 
1096 possess both symptoms, and the remainder have neither symptom. For one person chosen 
at random from this group, find these probabilities: 


a The person has neither symptom. 
b The person has at least one symptom. 


с The person has both symptoms, given that he has symptom В. 


Referto Exercise 2.149. Letthe random variable Y represent the number of symptoms possessed 
by a person chosen at random from the group. Compute the probabilities associated with each 
value of Y. 


A Model for the World Series Two teams A and B play a series of games until one team 
wins four games. We assume that the games are played independently and that the probability 
that A wins any game is p. What is the probability that the series lasts exactly five games? 


We know the following about a colormetric method used to test lake water for nitrates. If 
water specimens contain nitrates, a solution dropped into the water will cause the specimen to 
turn red 9546 of the time. When used on water specimens without nitrates, the solution causes 
the water to turn red 1096 of the time (because chemicals other than nitrates are sometimes 
present and they also react to the agent). Past experience in a lab indicates that nitrates are 
contained in 30% of the water specimens that are sent to the lab for testing. If a water specimen 
is randomly selected 


a from among those sent to the lab, what is the probability that it will turn red when tested? 
b and turns red when tested, what is the probability that it actually contains nitrates? 


Medical case histories indicate that different illnesses may produce identical symptoms. Sup- 
pose that a particular set of symptoms, denoted Н, occurs only when any one of three illnesses, 
1, D, or Jz, occurs. Assume that the simultaneous occurrence of more that one of these illnesses 
is impossible and that 


P(h)-.01, | P(b) =.005, Р() = .02. 


The probabilities of developing the set of symptoms H, given each of these illnesses, are known 
to be 


P(H|h)-.90, Р(Н|Ь) = .95, Р(Н|Һ) = 75. 


Assuming that an ill person exhibits the symptoms, Н, what is the probability that the person 
has illness Л? 


a А drawer contains п = 5 different and distinguishable pairs of socks (a total of ten socks). 
If a person (perhaps in the dark) randomly selects four socks, what is the probability that 
there is no matching pair in the sample? 

*b А drawer contains n different and distinguishable pairs of socks (a total of 2n socks). A 
person randomly selects 2r of the socks, where 2r < n. In terms of n and r, what is the 
probability that there is no matching pair in the sample? 


A group of men possesses the three characteristics of being married (A), having a college 
degree (B), and being a citizen of a specified state (C), according to the fractions given in the 
accompanying Venn diagram. That is, 5% of the men possess all three characteristics, whereas 
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2096 have a college education but are not married and are not citizens of the specified state. 
One man is chosen at random from this group. 


Са 


Dam c 


Find the probability that he 


is married. 
has a college degree and is married. 


is not from the specified state but is married and has a college degree. 


aoa C o$9 


is not married or does not have a college degree, given that he is from the specified state. 


The accompanying table lists accidental deaths by age and certain specific types for the United 
States in 2002. 


a Arandomly selected person from the United States was known to have an accidental death 
in 2002. Find the probability that 


i he was over the age of 15 years. 

ii е cause of death was a motor vehicle accident. 

ili the cause of death was a motor vehicle accident, given that the person was between 15 
and 24 years old. 

iv the cause of death was a drowning accident, given that it was not a motor vehicle 
accident and the person was 34 years old or younger. 

b From these figures can you determine the probability that a person selected at random from 
the U.S. population had a fatal motor vehicle accident in 2002? 


Type of Accident 
Age All Types Motor Vehicle Falls Drowning 

Under 5 2,707 819 44 568 

5-14 2,979 1,772 37 375 
15-24 14,113 10,560 237 646 
25-34 11,769 6,884 303 419 
35-44 15,413 6,927 608 480 
45—54 12,278 5,361 871 354 
55—64 7,505 3,506 949 217 
65-74 7,698 3,038 1,660 179 
75 and over 23,438 4,487 8,613 244 
Total 97,900 43,354 13,322 3,482 


Source: Compiled from National Vital Statistics Report 50, no. 15, 2002. 
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2.157 А study of the residents of a region showed that 20% were smokers. The probability of death 
due to lung cancer, given that a person smoked, was ten times the probability of death due to 
lung cancer, given that the person did not smoke. If the probability of death due to lung cancer 
in the region is .006, what is the probability of death due to lung cancer given that the person 
is a smoker? 


2.158 A bowl contains w white balls and b black balls. One ball is selected at random from the bowl, 
its color is noted, and it is returned to the bowl along with n additional balls of the same color. 
Another single ball is randomly selected from the bowl (now containing w + b + n balls) and 
it is observed that the ball is black. Show that the (conditional) probability that the first ball 
selected was white is ——-—. 

wt+b+n 

2.159  Itseems obvious that P (Ø) = 0. Show that this result follows from the axioms in Definition 2.6. 


2.160 A machine for producing a new experimental electronic component generates defectives from 
time to time in a random manner. The supervising engineer for a particular machine has 
noticed that defectives seem to be grouping (hence appearing in a nonrandom manner), thereby 
suggesting a malfunction in some part of the machine. One test for nonrandomness is based 
on the number of runs of defectives and nondefectives (a run is an unbroken sequence of 
either defectives or nondefectives). The smaller the number of runs, the greater will be the 
amount of evidence indicating nonrandomness. Of 12 components drawn from the machine, 
the first 10 were not defective, and the last 2 were defective (NNN N NNN NNN DD). Assume 
randomness. What is the probability of observing 


a this arrangement (resulting in two runs) given that 10 of the 12 components are not defec- 
tive? 


b two runs? 


2.161 Refer to Exercise 2.160. What is the probability that the number of runs, R, is less than or 
equal to 3? 


2.162 Assume that there are nine parking spaces next to one another in a parking lot. Nine cars need to 
be parked by an attendant. Three of the cars are expensive sports cars, three are large domestic 
cars, and three are imported compacts. Assuming that the attendant parks the cars at random, 
what is the probability that the three expensive sports cars are parked adjacent to one another? 


2.163 Relays used in the construction of electric circuits function properly with probability .9. As- 
suming that the circuits operate independently, which of the following circuit designs yields 
the higher probability that current will flow when the relays are activated? 


Q 9 07—39) 
2) Ө, 2)— —(9 


A B 


2.164 Refer to Exercise 2.163 and consider circuit A. If we know that current is flowing, what is the 
probability that switches 1 and 4 are functioning properly? 


2.165 Refer to Exercise 2.163 and consider circuit B. If we know that current is flowing, what is the 
probability that switches 1 and 4 are functioning properly? 


2.166 Eight tires of different brands are ranked from 1 to 8 (best to worst) according to mileage 
performance. If four of these tires are chosen at random by a customer, find the probability 
that the best tire among those selected by the customer is actually ranked third among the 
original eight. 
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Refer to Exercise 2.166. Let Y denote the actual quality rank of the best tire selected by the 
customer. In Exercise 2.166, you computed P(Y = 3). Give the possible values of Y and the 
probabilities associated with all of these values. 


As in Exercises 2.166 and 2.167, eight tires of different brands are ranked from 1 to 8 (best to 
worst) according to mileage performance. 


a If four of these tires are chosen at random by a customer, what is the probability that the 
best tire selected is ranked 3 and the worst is ranked 7? 

b In part (a) you computed the probability that the best tire selected is ranked 3 and the worst 
is ranked 7. If that is the case, the range of the ranks, R = largest rank — smallest rank 
=7—3 = 4. What is P(R = 4)? 


с Give all possible values for А and the probabilities associated with all of these values. 


Three beer drinkers (say I, II, and III) are to rank four different brands of beer (say A, B, C, 
and D) in a blindfold test. Each drinker ranks the four beers as 1 (for the beer that he or she 
liked best), 2 (for the next best), 3, or 4. 


a Carefully describe a sample space for this experiment (note that we need to specify the 
ranking of all four beers for all three drinkers). How many sample points are in this sample 
space? 

b Assume that the drinkers cannot discriminate between the beers so that each assignment 
of ranks to the beers is equally likely. After all the beers are ranked by all three drinkers, 
the ranks of each brand of beer are summed. What is the probability that some beer will 
receive a total rank of 4 or less? 


Three names are to be selected from a list of seven names for a public opinion survey. Find the 
probability that the first name on the list is selected for the survey. 


An AP news service story, printed in the Gainesville Sun on May 20, 1979, states the following 
with regard to debris from Skylab striking someone on the ground: “The odds are 1 in 150 that 
a piece of Skylab will hit someone. But 4 billion people ...live in the zone in which pieces 
could fall. So any one person's chances of being struck are one in 150 times 4 billion—or one 
in 600 billion.” Do you see any inaccuracies in this reasoning? 


Let A and B be any two events. Which of the following statements, in general, are false? 


a P(A|B) + Р(А|В) = 1. 
b P(A|B) + P(A[B) = 1. 
c P(A|B) + P(A|B) = 1. 


As items come to the end of a production line, an inspector chooses which items are to go 
through a complete inspection. Ten percent of all items produced are defective. Sixty percent 
of all defective items go through a complete inspection, and 20% of all good items go through 
a complete inspection. Given that an item is completely inspected, what is the probability it 
is defective? 


Many public schools are implementing a “по-раѕѕ, no-play" rule for athletes. Under this system, 
a student who fails a course is disqualified from participating in extracurricular activities 
during the next grading period. Suppose that the probability is .15 that an athlete who has 
not previously been disqualified will be disqualified next term. For athletes who have been 
previously disqualified, the probability of disqualification next term is .5. If 3096 of the athletes 
have been disqualified in previous terms, what is the probability that a randomly selected athlete 
will be disqualified during the next grading period? 
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Three events, A, B, and C, are said to be mutually independent if 


P(AN B) = P(A) x P(B), P(BNC) = P(B) x P(C), 
P(ANC) = P(A) x P(C), P(AN BNC) = P(A) x P(B) x P(C). 


Suppose that a balanced coin is independently tossed two times. Define the following events: 


A: Head appears on the first toss. 
B: Head appears on the second toss. 
C: Both tosses yield the same outcome. 


Are A, B, and C mutually independent? 
Refer to Exercise 2.175 and suppose that events A, B, and C are mutually independent. 


a Show that (A U B) and С are independent. 
b Show that A and (B N C) are independent. 


Refer to Exercise 2.90(b) where a friend claimed that if there is a 1 in 50 chance of injury on 
a single jump then there is a 100% chance of injury if a skydiver jumps 50 times. Assume that 
the results of repeated jumps are mutually independent. 


a What is the probability that 50 jumps will be completed without an injury? 
b What is the probability that at least one injury will occur in 50 jumps? 


c Whatis the maximum number of jumps, п, the skydiver can make if the probability is at 
least .60 that all п jumps will be completed without injury? 


Suppose that the probability of exposure to the flu during an epidemic is .6. Experience has 
shown that a serum is 8096 successful in preventing an inoculated person from acquiring the 
flu, if exposed to it. A person not inoculated faces a probability of .90 of acquiring the flu if 
exposed to it. Two persons, one inoculated and one not, perform a highly specialized task in a 
business. Assume that they are not at the same location, are not in contact with the same people, 
and cannot expose each other to the flu. What is the probability that at least one will get the flu? 


Two gamblers bet $1 each on the successive tosses of a coin. Each has a bank of $6. What is 
the probability that 


a they break even after six tosses of the coin? 


b опе player—say, Jones—wins all the money on the tenth toss of the coin? 


Suppose that the streets of a city are laid out in a grid with streets running north-south and 
east-west. Consider the following scheme for patrolling an area of 16 blocks by 16 blocks. An 
officer commences walking at the intersection in the center of the area. At the corner of each 
block the officer randomly elects to go north, south, east, or west. What is the probability that 
the officer will 


a reach the boundary of the patrol area after walking the first 8 blocks? 


b return to the starting point after walking exactly 4 blocks? 


Suppose that n indistinguishable balls are to be arranged in N distinguishable boxes so that 
each distinguishable arrangement is equally likely. If n > N, show that the probability no box 


will be empty is given by 
n—i 
N-1 


N+n-1\" 
N-1 
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Basic Definition 


As stated in Section 2.12, a random variable is a real-valued function defined over 
a sample space. Consequently, a random variable can be used to identify numerical 
events that are of interest in an experiment. For example, the event of interest in an 
opinion poll regarding voter preferences is not usually the particular people sampled 
or the order in which preferences were obtained but Y = the number of voters favoring 
a certain candidate or issue. The observed value of this random variable must be zero 


DEFINITION 3.1 


3.2 
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or an integer between 1 and the sample size. Thus, this random variable can take on 
only a finite number of values with nonzero probability. A random variable of this 
type is said to be discrete. 


A random variable Y is said to be discrete if it can assume only a finite or 
countably infinite! number of distinct values. 


A less formidable characterization of discrete random variables can be obtained 
by considering some practical examples. The number of bacteria per unit area in 
the study of drug control on bacterial growth is a discrete random variable, as is the 
number of defective television sets in a shipment of 100 sets. Indeed, discrete random 
variables often represent counts associated with real phenomena. 

Let us now consider the relation of the material in Chapter 2 to this chapter. Why 
study the theory of probability? The answer is that the probability of an observed 
event is needed to make inferences about a population. The events of interest are often 
numerical events that correspond to values of discrete random variables. Hence, it is 
imperative that we know the probabilities of these numerical events. Because certain 
types of random variables occur so frequently in practice, it is useful to have at hand 
the probability for each value of a random variable. This collection of probabilities is 
called the probability distribution of the discrete random variable. We will find that 
many experiments exhibit similar characteristics and generate random variables with 
the same type of probability distribution. Consequently, knowledge of the probability 
distributions for random variables associated with common types of experiments will 
eliminate the need for solving the same probability problems over and over again. 


The Probability Distribution 
for a Discrete Random Variable 


Notationally, we will use an uppercase letter, such as Y, to denote a random variable 
and a lowercase letter, such as y, to denote a particular value that a random variable 
may assume. For example, let Y denote any one of the six possible values that could 
be observed on the upper face when a die is tossed. After the die is tossed, the number 
actually observed will be denoted by the symbol y. Note that Y is a random variable, 
but the specific observed value, y, is not random. 

The expression (Y = y) can be read, the set of all points in S assigned the value 
y by the random variable Y. 

It is now meaningful to talk about the probability that Y takes on the value y, 
denoted by P(Y = y). As in Section 2.11, this probability is defined as the sum of 
the probabilities of appropriate sample points in S. 


1. Recall that a set of elements is countably infinite if the elements in the set can be put into one-to-one 
correspondence with the positive integers. 
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Discrete Random Variables and Their Probability Distributions 


The probability that Y takes on the value y, P(Y — y), is defined as the sum 
of the probabilities of all sample points in S that are assigned the value y. We 
will sometimes denote P(Y — y) by p(y). 


Because p(y) is a function that assigns probabilities to each value y of the random 
variable Y, it is sometimes called the probability function for Y . 


The probability distribution for a discrete variable Y can be represented by a 
formula, a table, or a graph that provides p(y) — P(Y — y) for all y. 


Notice that p(y) > 0 for all y, but the probability distribution for a discrete random 
variable assigns nonzero probabilities to only a countable number of distinct y values. 
Any value y not explicitly assigned a positive probability is understood to be such 
that p(y) = 0. We illustrate these ideas with an example. 


A supervisor in a manufacturing plant has three men and three women working for 
him. He wants to choose two workers for a special job. Not wishing to show any 
biases in his selection, he decides to select the two workers at random. Let Y denote 
the number of women in his selection. Find the probability distribution for У. 


The supervisor can select two workers from six in ($) = 15 ways. Hence, S contains 
15 sample points, which we assume to be equally likely because random sampling 
was employed. Thus, P(E;) = 1/15, fori = 1, 2,..., 15. The values for Y that have 
nonzero probability are 0, 1, and 2. The number of ways of selecting Y = 0 women 
is (3) (3) because the supervisor must select zero workers from the three women and 
two from the three men. Thus, there аге (2) (3) = 1-3 = 3 sample points in the event 
Y = 0, and 


(00) _ 3 1 
0) = PY = 0) = SY = =L. 
р(0) ( ) 15 155 
Similarly, 
(0 9 3 
) = P(Y = 1) = WM = = 5, 
pa) ( ) 15 15 = 5 
3) (3 
(OG. 39. 3 
РОТИ EM 3 d 
риу кета с а 
Notice that (Y = 1) is by far the most likely outcome. This should seem reasonable 
since the number of women equals the number of men in the original group. E 


The table for the probability distribution of the random variable Y considered in 
Example 3.1 is summarized in Table 3.1. The same distribution is given in graphical 
form in Figure 3.1. If we regard the width at each bar in Figure 3.1 as one unit, then 


FIGURE 3.1 
Probability histogram 
for Table 3.1 
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Table 3.1 Probability distribution 
for Example 3.1 


pO) 
0 1/5 
1 3/5 
2 1/5 
p» 
35 | 
1/5 = 
0 | | | 
0 1 2 y 


the area in a bar is equal to the probability that Y takes on the value over which the 
bar is centered. This concept of areas representing probabilities was introduced in 
Section 1.2. 

The most concise method of representing discrete probability distributions is by 
means of a formula. For Example 3.1 we see that the formula for p(y) can be written as 


(202) 


р(у) = 6 , y =0, 1, 2. 
(>) 
Notice that the probabilities associated with all distinct values of a discrete random 
variable must sum to 1. In summary, the following properties must hold for any discrete 
probability distribution: 


For any discrete probability distribution, the following must be true: 


1. 0 € p(y) < 1 forall y. 
2 NS y P(y) = 1, where the summation is over all values of y with nonzero 
probability. 


As mentioned in Section 1.5, the probability distributions we derive are models, not 
exact representations, for the frequency distributions of populations of real data that 
occur (or would be generated) in nature. Thus, they are models for real distributions 
of data similar to the distributions discussed in Chapter 1. For example, if we were to 
randomly select two workers from among the six described in Example 3.1, we would 
observe a single y value. In this instance the observed y value would be 0, 1, or 2. 
If the experiment were repeated many times, many y values would be generated. A 
relative frequency histogram for the resulting data, constructed in the manner de- 
scribed in Chapter 1, would be very similar to the probability histogram of Figure 3.1. 
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3.3 


3.4 


3.5 


3.6 


3.7 


3.8 


Such simulation studies are very useful. By repeating some experiments over and 
over again, we can generate measurements of discrete random variables that possess 
frequency distributions very similar to the probability distributions derived in this 
chapter, reinforcing the conviction that our models are quite accurate. 


Exercises 


When the health department tested private wells in a county for two impurities commonly found 
in drinking water, it found that 20% of the wells had neither impurity, 40% had impurity A, 
and 50% had impurity B. (Obviously, some had both impurities.) If a well is randomly chosen 
from those in the county, find the probability distribution for Y, the number of impurities found 
in the well. 


You and a friend play a game where you each toss a balanced coin. If the upper faces on the 
coins are both tails, you win $1; if the faces are both heads, you win $2; if the coins do not match 
(one shows a head, the other a tail), you lose $1 (win (—$1)). Give the probability distribution 
for your winnings, Y, on a single play of this game. 


A group of four components is known to contain two defectives. An inspector tests the compo- 
nents one at a time until the two defectives are located. Once she locates the two defectives, she 
stops testing, but the second defective is tested to ensure accuracy. Let Y denote the number of 
the test on which the second defective is found. Find the probability distribution for У. 


Consider a system of water flowing through valves from A to B. (See the accompanying 
diagram.) Valves 1, 2, and 3 operate independently, and each correctly opens on signal with 
probability .8. Find the probability distribution for Y, the number of open paths from A to B 
after the signal is given. (Note that Y can take on the values 0, 1, and 2.) 


1 


2 3 


A problem in a test given to small children asks them to match each of three pictures of animals 
to the word identifying that animal. If a child assigns the three words at random to the three 
pictures, find the probability distribution for Y, the number of correct matches. 


Five balls, numbered 1, 2, 3, 4, and 5, are placed in an urn. Two balls are randomly selected 
from the five, and their numbers noted. Find the probability distribution for the following: 


a The largest of the two sampled numbers 


b The зит of the two sampled numbers 


Each of three balls are randomly placed into one of three bowls. Find the probability distribution 
for Y — the number of empty bowls. 


A single cell can either die, with probability .1, or split into two cells, with probability .9, 
producing a new generation of cells. Each cell in the new generation dies or splits into two cells 
independently with the same probabilities as the initial cell. Find the probability distribution 
for the number of cells in the next generation. 
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In order to verify the accuracy of their financial accounts, companies use auditors on a regular 
basis to verify accounting entries. The company's employees make erroneous entries 596 of 
the time. Suppose that an auditor randomly checks three entries. 


a Find the probability distribution for Y, the number of errors detected by the auditor. 
b Construct a probability histogram for p(y). 


c Findthe probability that the auditor will detect more than one error. 


A rental agency, which leases heavy equipment by the day, has found that one expensive piece 
of equipment is leased, on the average, only one day in five. If rental on one day is independent 
of rental on any other day, find the probability distribution of Y, the number of days between 
a pair of rentals. 


Persons entering a blood bank are such that 1 in 3 have type O* blood and 1 in 15 have type O^ 
blood. Consider three randomly selected donors for the blood bank. Let X denote the number of 
donors with type O* blood and Y denote the number with type O^ blood. Find the probability 
distributions for X and Y. Also find the probability distribution for X 4- Y, the number of 
donors who have type O blood. 


The Expected Value of a Random Variable 
or a Function of a Random Variable 


We have observed that the probability distribution for a random variable is a theoret- 
ical model for the empirical distribution of data associated with a real population. If 
the model is an accurate representation of nature, the theoretical and empirical dis- 
tributions are equivalent. Consequently, as in Chapter 1, we attempt to find the mean 
and the variance for a random variable and thereby to acquire numerical descriptive 
measures, parameters, for the probability distribution p(y) that are consistent with 
those discussed in Chapter 1. 


Let Y be a discrete random variable with the probability function p(y). Then 
the expected value of Y , E (Y), is defined to be? 


E(Y) = У yp). 


If p(y) is an accurate characterization of the population frequency distribution, 
then E(Y) = yp, the population mean. 

Definition 3.4 is completely consistent with the definition of the mean of a set of 
measurements that was given in Definition 1.1. For example, consider a discrete 


2. To be precise, the expected value of a discrete random variable is said to exist if the sum, as given 
earlier, is absolutely convergent—that is, if 


X ipo) < оо. 


This absolute convergence will hold for all examples in this text and will not be mentioned each time an 
expected value is defined. 
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FIGURE 3.2 
Probability 
distribution for Y 


Table 3.2 Probability distribution for Y 


y p) 
0 1/4 
1 1/2 
2 1/4 
p» 
5t. 
25 — 
0 | | | 
0 1 2 у 


random variable Y that can assume values 0, 1, апа 2 with probability distribution 
p(y) as shown in Table 3.2 and the probability histogram shown in Figure 3.2. A 
visual inspection will reveal the mean of the distribution to be located at y = 1. 

To show that E(Y) = b yp(y) is the mean of the probability distribution p(y), 
suppose that the experiment were conducted 4 million times, yielding 4 million 
observed values for Y. Noting p(y) in Figure 3.2, we would expect approximately 
1 million of the 4 million repetitions to result in the outcome Y = 0, 2 million in 
Y = 1,and 1 million in У = 2. To find the mean value of Y, we average these 4 million 
measurements and obtain 


Xa (1,000,000) (0) + (2,000,000) (1) + (1,000,000) 2) 
Lo a T 4,000,000 
= (0)(1/4) + (1)(1/2) + Q)(1/4) 


2 
= У`ур(у) = 1. 
у=0 


Thus, Е(Ү) is an average, and Definition 3.4 is consistent with the definition of a 
mean given in Definition 1.1. Similarly, we frequently are interested in the mean or 
expected value of a function of a random variable Y . For example, molecules in space 
move at varying velocities, where Y, the velocity of a given molecule, is a random 
variable. The energy imparted upon impact by a moving body is proportional to the 
square of the velocity. Consequently, to find the mean amount of energy transmitted 
by a molecule upon impact, we must find the mean value of Y?. More important, we 
note in Definition 1.2 that the variance of a set of measurements is the mean of the 
square of the differences between each value in the set of measurements and their 
mean, or the mean value of (Y — ш). 
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Let Y be a discrete random variable with probability function p(y) and g(Y) 
be a real-valued function of Y. Then the expected value of g(Y) is given by 


E[g(¥)] = У g(y)pQ). 
all y 
We prove the result in the case where the random variable Y takes on the finite 
number of values y;, y2,..., Ул. Because the function g(y) may not be опе 
to-one, suppose that g(Y) takes on values gi, g2,..., 8m (where m < n). It 
follows that g(Y) is a random variable such that fori = 1, 2,..., т, 


ВЕЕ pO) =P (eod 
all у; such that 
8j)—8i 
Thus, by Definition 3.4, 


m 


E[g(Y)] = У gip*(gi) 
= 


т 
= У`а1 > poy) 
= all y; such that 
g(yj)—8i 


m 


= ». ЭК 


ї=1 all y; such that 
5023) = 8: 


= X s60)»0). 
j=l 


Now let us return to our immediate objective, finding numerical descriptive mea- 
sures (or parameters) to characterize p(y). As previously discussed, E(Y) provides 
the mean of the population with distribution given by p(y). We next seek the vari- 
ance and standard deviation of this population. You will recall from Chapter | that 
the variance of a set of measurements is the average of the square of the differences 
between the values in a set of measurements and their mean. Thus, we wish to find 
the mean value of the function g(Y) = (Y — py. 


If Y is a random variable with mean E(Y) = и, the variance of a random 
variable Y is defined to be the expected value of (Y — ш)?. That is, 


V(Y) = EK — uy. 


The standard deviation of Y is the positive square root of V (Y). 


If p(y) is an accurate characterization of the population frequency distribution (and 
to simplify notation, we will assume this to be true), then E(Y) = u, V(Y) = о?, 
the population variance, апа с is the population standard deviation. 
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Solution 


FIGURE 3.3 
Probability histogram 
for Example 3.2 


The probability distribution for a random variable Y is given in Table 3.3. Find the 


mean, variance, and standard deviation of Y. 


Table 3.3 Probability distribution for Y 


pO) 
0 1/8 
1 1/4 
2 3/8 
3 1/4 


By Definitions 3.4 and 3.5, 


3 
и = Е(Ү) = У `ур(у) = (0)(1/8) + (1)(1/4) + 
у=0 


3 
c? = EY – ny] = 3 (у - Y pO) 


у=0 
= (0— 1.75)2(1/8) + (1 — 1.75)2(1/4) + 0— 
9375, 
a = Vo? = V 9315 = .97. 


(2)(3/8) + (3)(1/4) = 1.75, 


1.75)2(3/8) + (3 — 1.75)2(1/4) 


The probability histogram is shown in Figure 3.3. Locate u on the axis of measure- 
ment, and observe that it does locate the “center” of the nonsymmetrical probability 


distribution of У. Also notice that the interval (u 4 
Y = 1 and Y = 2, which account for 5/8 of the pro 
(Chapter 1) provides a reasonable approximation to 


- c) contains the discrete points 
bability. Thus, the empirical rule 
the probability of a measurement 


falling in this interval. (Keep in mind that the probabilities are concentrated at the 
points Y = 0, 1, 2, and 3 because Y cannot take intermediate values.) 


p(y) 


3/8 LL 


1/4 — 


It will be helpful to acquire a few additional tools and definitions before attempt- 
ing to find the expected values and variances of more complicated discrete random 
variables, such as the binomial or Poisson. Hence, we present three useful expectation 
theorems that follow directly from the theory of summation. (Other useful techniques 
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are presented in Sections 3.4 and 3.9.) For each theorem we assume that Y is a discrete 
random variable with probability function p(y). 

The first theorem states the rather obvious result that the mean or expected value 
of a nonrandom quantity c is equal to c. 


THEOREM 3.3 Let Y be a discrete random variable with probability function p(y) and c be a 
constant. Then Е (с) = c. 


Proof Consider the function g(Y) = c. By Theorem 3.2, 
O= cu 
у у 


But 2 p(y) = 1 (Theorem 3.1) and, hence, E(c) = c(1) — c. 


The second theorem states that the expected value of the product of a constant c 
times a function of a random variable is equal to the constant times the expected value 
of the function of the variable. 


THEOREM 3.4 Let Y be a discrete random variable with probability function p(y), g(Y) bea 
function of Y, and c be a constant. Then 


E|[cg(Y)] = cE[g(Y)]. 


Proof By Theorem 3.2, 
E[cg(Y)] = У cay) p(y) = с }ў 800p) = cElg(Y)]. 


The third theorem states that the mean or expected value of a sum of functions of 
a random variable Y is equal to the sum of their respective expected values. 


THEOREM 3.5 Let Y be a discrete random variable with probability function p(y) and gı (Y), 
g2(Y), ..., gx (Y) be К functions of Y. Then 


Elgi(Y) + go(Y) +--+ gxQ)]— Elgi(Y)] + ELlgoQY)] + -- - + Elen]. 


Proof We will demonstrate the proof only for the case k = 2, but analogous steps will 
hold for any finite k. By Theorem 3.2, 


Elgi Y) + 2201 = У 1810) + 22001p 0) 


y 


= У`а1(у)р(у) + 3 (9) PO) 
y y 


= E[gi(Y)] ТЕ O 


Theorems 3.3, 3.4, and 3.5 can be used immediately to develop a theorem useful 
in finding the variance of a discrete random variable. 
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THEOREM 3.6 


Proof 


Let Y be a discrete random variable with probability function p(y) and mean 
EY) = p; then 


У(Ү) = о? = Е[(Ү — ш)?] = аа 


о? = E[(Y — 1] = EY? - 2uY + и?) 
= E(Y) – EQuY) + E(u) (by Theorem 3.5). 


Noting that jz is a constant and applying Theorems 3.4 and 3.3 to the second 
and third terms, respectively, we have 


ee EE 
But jj = E(Y) and, therefore, 
C=] bY) =O pi EQ = 7. 


Theorem 3.6 often greatly reduces the labor in finding the variance of a discrete 
random variable. We will demonstrate the usefulness of this result by recomputing 
the variance of the random variable considered in Example 3.2. 


EXAMPLE 3.3 


Solution 


Use Theorem 3.6 to find the variance of the random variable Y in Example 3.2. 


The mean jz = 1.75 was found in Example 3.2. Because 
Е(Ү?) = 3 y! po) = (00/8) + 00? (1/4) + Q? (3/8) + G* (1/4) = 4, 
y 
Theorem 3.6 yields that 


o? = E(Y?) — и? —4 — (1.75)? = 9375. ш 


EXAMPLE 3.4 


The manager of an industrial plant is planning to buy a new machine of either type A 
or type B. If t denotes the number of hours of daily operation, the number of daily 
repairs Y; required to maintain a machine of type A is a random variable with mean 
and variance both equal to .10т. The number of daily repairs Y? for a machine of 
type В is a random variable with mean and variance both equal to .12r. The daily 
cost of operating А is C4(t) = 10t + 30Y?; for B itis Cg(t) = 8t + 3075. Assume 
that the repairs take negligible time and that each night the machines are tuned so 
that they operate essentially like new machines at the start of the next day. Which 
machine minimizes the expected daily cost if a workday consists of (a) 10 hours and 
(b) 20 hours? 


Solution 


Exercises 97 


The expected daily cost for A is 

Е[СА(0)] = E[10t + 30Y7] = 10t + 30£ (Y?) 
107 + 30(V (Y) + LE(Y)) P) = 10r + 30[.10¢ + (.108)7] 
13t + 31. 


In this calculation, we used the known values for V (Y) and Е(Ү,) and the fact 
that V(Yi) = E(Y2) — [E(Y)P to obtain that E(Y2) = V(Yi)) + [E(%)P = 
.10r + (.10:)?. Similarly, 


Е[Св(@)] = E[8t + 30Y3] = 8t + 30E (Y2) 
= 8t + 30{V (Y2) + [Е(Ү»)]?} = 8t + 30[.12t + (.121)2] 
= 11.67 + .432?. 
Thus, for scenario (a) where t = 10, 
E[CA(10)] = 160 and Е[С»(10)] = 159.2, 


which results in the choice of machine B. 
For scenario (b), f = 20 and 


E[C4(20)] = 380 апа E[Cg(20)] = 404.8, 


resulting in the choice of machine A. 

In conclusion, machines of type B are more economical for short time periods 
because of their smaller hourly operating cost. For long time periods, however, ma- 
chines of type A are more economical because they tend to be repaired less frequently. 


The purpose of this section was to introduce the concept of an expected value and 
to develop some useful theorems for finding means and variances of random variables 
or functions of random variables. In the following sections, we present some specific 
types of discrete random variables and provide formulas for their probability distribu- 
tions and their means and variances. As you will see, actually deriving some of these 
expected values requires skill in the summation of algebraic series and knowledge of 
a few tricks. We will illustrate some of these tricks in some of the derivations in the 
upcoming sections. 


Exercises 


Let Y be a random variable with p(y) given in the accompanying table. Find E(Y), E(1/Y), 
E(Y? — D), and V(Y). 


у [1 2 3 4 


po) | 4 i 2 all 
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3.13 


3.14 


3.15 


3.16 


3.20 


3.21 


Refer to the coin-tossing game in Exercise 3.2. Calculate the mean and variance of Y, your 
winnings оп a single play of the game. Note that E(Y) > 0. How much should you pay to play 
this game if your net winnings, the difference between the payoff and cost of playing, are to 
have mean 0? 


The maximum patent life for a new drug is 17 years. Subtracting the length of time required by 
the FDA for testing and approval of the drug provides the actual patent life for the drug—that 
is, the length of time that the company has to recover research and development costs and to 
make a profit. The distribution of the lengths of actual patent lives for new drugs is given below: 


Years, y | 3 4 5 6 7 8 9 10 11 12 13 
р(у) | .03 05 07 10 14 20 18 .12 07 03 .01 


a Find the mean patent life for a new drug. 
b Find the standard deviation of Y = the length of life of a randomly selected new drug. 
What is the probability that the value of Y falls in the interval и + 20? 


An insurance company issues a one-year $1000 policy insuring against an occurrence A that 
historically happens to 2 out of every 100 owners of the policy. Administrative fees are $15 per 
policy and are not part of the company's "profit? How much should the company charge for the 
policy if it requires that the expected profit per policy be $50? [Hint: If C is the premium for the 


, 


policy, the company’s “profit” is C — 15 if A does not occur and C — 15 — 1000 if A does occur.] 


The secretary in Exercise 2.121 was given п computer passwords and tries the passwords at 
random. Exactly one password will permit access to a computer file. Find the mean and the 
variance of Y, the number of trials required to open the file, if unsuccessful passwords are 
eliminated (as in Exercise 2.121). 


Refer to Exercise 3.7. Find the mean and standard deviation for Y — the number of empty 
bowls. Whatis the probability that the value of Y falls within 2 standard deviations of the mean? 


Refer to Exercise 3.8. What is the mean number of cells in the second generation? 


Who is the king of late night TV? An Internet survey estimates that, when given a choice 
between David Letterman and Jay Leno, 52% of the population prefers to watch Jay Leno. 
Three late night TV watchers are randomly selected and asked which of the two talk show 
hosts they prefer. 


a Find the probability distribution for Y, the number of viewers in the sample who prefer 
Leno. 


Construct a probability histogram for p(y). 
What is the probability that exactly one of the three viewers prefers Leno? 
What are the mean and standard deviation for У? 


oc с с“ c 


What is the probability that the number of viewers favoring Leno falls within 2 standard 
deviations of the mean? 


A manufacturing company ships its product in two different sizes of truck trailers. Each ship- 
ment is made іп a trailer with dimensions 8 feet x 10 feet x 30 feet or 8 feet x 10 feet x 40 feet. 
If 30% of its shipments are made by using 30-foot trailers and 70% by using 40-foot trailers, 
find the mean volume shipped per trailer load. (Assume that the trailers are always full.) 


The number N of residential homes that a fire company can serve depends on the distance r (in 
city blocks) that a fire engine can cover in a specified (fixed) period of time. If we assume that 


3.22 


3.23 


3.24 


3.25 


*3.26 


3.27 


3.28 


*3.29 


3.30 
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N is proportional to the area of a circle R blocks from the firehouse, then N = Cz R?, where C is 
aconstant, 7 = 3.1416..., and R, arandom variable, is the number of blocks that a fire engine 
can move in the specified time interval. For a particular fire company, C = 8, the probability 
distribution for R is as shown in the accompanying table, and p(r) = 0 forr < 20 andr > 27. 


r | 21 22 23 24 25 26 


pr) | 05 20 30 25 15 05 


Find the expected value of N, the number of homes that the fire department can serve. 


A single fair die is tossed once. Let Y be the number facing up. Find the expected value and 
variance of Y. 


In a gambling game a person draws a single card from an ordinary 52-card playing deck. A 
person is paid $15 for drawing a jack or a queen and $5 for drawing a king or an ace. A person 
who draws any other card pays $4. If a person plays this game, what is the expected gain? 


Approximately 10% of the glass bottles coming off a production line have serious flaws in the 
glass. If two bottles are randomly selected, find the mean and variance of the number of bottles 
that have serious flaws. 


Two construction contracts are to be randomly assigned to one or more of three firms: I, II, 
and III. Any firm may receive both contracts. If each contract will yield a profit of $90,000 for 
the firm, find the expected profit for firm I. If firms I and II are actually owned by the same 
individual, what is the owner's expected total profit? 


A heavy-equipment salesperson can contact either one or two customers per day with proba- 
bility 1/3 and 2/3, respectively. Each contact will result in either no sale or a $50,000 sale, 
with the probabilities .9 and .1, respectively. Give the probability distribution for daily sales. 
Find the mean and standard deviation of the daily sales? 


A potential customer for an $85,000 fire insurance policy possesses a home in an area that, ac- 
cording to experience, may sustain a total loss in a given year with probability of .001 and a 50% 
loss with probability .01. Ignoring all other partial losses, what premium should the insurance 
company charge for a yearly policy in order to break even on all $85,000 policies in this area? 


Refer to Exercise 3.3. If the cost of testing a component is $2 and the cost of repairing a 
defective is $4, find the expected total cost for testing and repairing the lot. 


If Y is a discrete random variable that assigns positive probabilities to only the positive integers, 
show that 


E(Y) = p» P(Y > К). 
iz 


Suppose that Y is a discrete random variable with mean jz and variance o? and let X = Y +1. 


a Do you expect the mean of X to be larger than, smaller than, or equal to u = E(Y)? Why? 


b Use Theorems 3.3 and 3.5 to express E(X) = E(Y + 1) in terms of u = E(Y). Does this 
result agree with your answer to part (a)? 


c Recalling that the variance is a measure of spread or dispersion, do you expect the variance 
of X to be larger than, smaller than, or equal to o? 2 V(Y)? Why? 


3. Exercises preceded by an asterisk are optional. 
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3.31 


3.32 


3.33 


3.34 


3.4 


d Use Definition 3.5 and the result in part (b) to show that 
V(X) = E((X — E(X)P} = ELY — и) = 0°; 
that is, X = Y + 1 and Y have equal variances. 
Suppose that Y is a discrete random variable with mean jz and variance c? and let W = 2Y. 


a Doyouexpectthe mean of W to be larger than, smaller than, or equal to и = E(Y)? Why? 


b Use Theorem 3.4 to express E(W) = E(2Y) in terms of u = E(Y). Does this result agree 
with your answer to part (a)? 


c Recalling that the variance is a measure of spread or dispersion, do you expect the variance 
of W to be larger than, smaller than, or equal to o? = V(Y)? Why? 


d Use Definition 3.5 and the result in part (b) to show that 
V(W) = E(W — EQVE] = EI4(Y — ny] = 40°; 
that is, W = 2Y has variance four times that of У. 
Suppose that Y is a discrete random variable with mean и and variance o? and let U = Y /10. 


a Do you expect the mean of U to be larger than, smaller than, or equal to u = E(Y)? Why? 


b Use Theorem 3.4 to express E(U) = E(Y/10) in terms of и = E(Y). Does this result 
agree with your answer to part (a)? 


c Recalling that the variance is a measure of spread or dispersion, do you expect the variance 
of U to be larger than, smaller than, or equal to o? = V (Y)? Why? 


d Use Definition 3.5 and the result in part (b) to show that 
V(U) = ЕО — E(U)[} = Е[.01(У — 4] = .0102; 
that is, U = Y/10 has variance .01 times that of У. 


Let Y be a discrete random variable with mean џи and variance o?. If a and b are constants, 
use Theorems 3.3 through 3.6 to prove that 


a E(aY --b)—aE(Y)- b —ayg +b. 
b V(aY +b) =a?V(Y) = а?о?. 


The manager of a stockroom in a factory has constructed the following probability distribution 
for the daily demand (number of times used) for a particular tool. 


y a ї <2 
py) | 4 5 4 


It costs the factory $10 each time the tool is used. Find the mean and variance of the daily cost 
for use of the tool. 


The Binomial Probability Distribution 


Some experiments consist of the observation of a sequence of identical and inde- 
pendent trials, each of which can result in one of two outcomes. Each item leaving 
a manufacturing production line is either defective or nondefective. Each shot in a 
sequence of firings at a target can result in a hit or a miss, and each of n persons 


DEFINITION 3.6 
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questioned prior to a local election either favors candidate Jones or does not. In this 
section we are concerned with experiments, known as binomial experiments, that 
exhibit the following characteristics. 


A binomial experiment possesses the following properties: 


1. The experiment consists of a fixed number, n, of identical trials. 
Each trial results in one of two outcomes: success, S, or failure, F. 

3. The probability of success on a single trial is equal to some value p and 
remains the same from trial to trial. The probability of a failure is equal to 
а= (1— p). 

4. The trials are independent. 

5. The random variable of interest is Y, the number of successes observed 
during the n trials. 


Determining whether a particular experiment is a binomial experiment requires 
examining the experiment for each of the characteristics just listed. Notice that the 
random variable of interest is the number of successes observed in the п trials. It is 
important to realize that a success is not necessarily “good” in the everyday sense of 
the word. In our discussions, success is merely a name for one of the two possible 
outcomes on a single trial of an experiment. 


EXAMPLE 3.5 


Solution 


An early-warning detection system for aircraft consists of four identical radar units 
operating independently of one another. Suppose that each has a probability of .95 
of detecting an intruding aircraft. When an intruding aircraft enters the scene, the 
random variable of interest is Y, the number of radar units that do not detect the 
plane. Is this a binomial experiment? 


To decide whether this is a binomial experiment, we must determine whether each 
of the five requirements in Definition 3.6 is met. Notice that the random variable of 
interest is Y, the number of radar units that do riot detect an aircraft. The random 
variable of interest in a binomial experiment is always the number of successes; 
consequently, the present experiment can be binomial only if we call the event do not 
detect a success. We now examine the experiment for the five characteristics of the 
binomial experiment. 


1. The experiment involves four identical trials. Each trial consists of determining 
whether (or not) a particular radar unit detects the aircraft. 

2. Each trial results in one of two outcomes. Because the random variable of 
interest is the number of successes, 5 denotes that the aircraft was not detected, 
and F denotes that it was detected. 

3. Because all the radar units detect aircraft with equal probability, the probability 
of an S on each trial is the same, and p = P(S) = P(do not detect) = .05. 
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4. The trials are independent because the units operate independently. 
5. The random variable of interest is Y, the number of successes in four trials. 


Thus, the experiment is a binomial experiment, with n = 4, р = .05, and q = 
] —.05 2.95. = 


EXAMPLE 3.6 


Solution 


Suppose that 40% of a large population of registered voters favor candidate Jones. 
A random sample of n = 10 voters will be selected, and Y, the number favoring 
Jones, is to be observed. Does this experiment meet the requirements of a binomial 
experiment? 


If each of the ten people is selected at random from the population, then we have 
ten nearly identical trials, with each trial resulting in a person either favoring Jones 
(S) or not favoring Jones (F). The random variable of interest is then the number of 
successes in the ten trials. For the first person selected, the probability of favoring 
Jones (S) is .4. But what can be said about the unconditional probability that the 
second person will favor Jones? In Exercise 3.35 you will show that unconditionally 
the probability that the second person favors Jones is also .4. Thus, the probability of 
a success $ stays the same from trial to trial. However, the conditional probability of 
a success on later trials depends on the number of successes in the previous trials. If 
the population of voters is large, removal of one person will not substantially change 
the fraction of voters favoring Jones, and the conditional probability that the second 
person favors Jones will be very close to .4. In general, if the population is large and 
the sample size is relatively small, the conditional probability of success on a later 
trial given the number of successes on the previous trials will stay approximately 
the same regardless of the outcomes on previous trials. Thus, the trials will be ap- 
proximately independent and so sampling problems of this type are approximately 
binomial. 


If the sample size in Example 3.6 was large relative to the population size (say, 10% 
of the population), the conditional probability of selecting a supporter of Jones on a 
later selection would be significantly altered by the preferences of persons selected 
earlier in the experiment, and the experiment would not be binomial. The hypergeo- 
metric probability distribution, the topic of Section 3.7, is the appropriate probability 
model to be used when the sample size is large relative to the population size. 

You may wish to refine your ability to identify binomial experiments by reexamin- 
ing the exercises at the end of Chapter 2. Several of the experiments in those exercises 
are binomial or approximately binomial experiments. 

The binomial probability distribution p(y) can be derived by applying the sample- 
point approach to find the probability that the experiment yields y successes. Each 
sample point in the sample space can be characterized by an n-tuple involving the 
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letters S and F, corresponding to success and failure. A typical sample point would 
thus appear as 


SSFSFFFSFS...FS, 
— ——————————À 
n positions 


where the letter in the ith position (proceeding from left to right) indicates the outcome 
of the ith trial. 

Now let us consider a particular sample point corresponding to y successes and 
hence contained in the numerical event Y — y. This sample point, 


SSSSS...SSSFFF...FF, 
— a 
y n— y 


represents the intersection of n independent events (the outcomes of the n trials), 
in which there were y successes followed by (n — y) failures. Because the trials 
were independent and the probability of S, p, stays the same from trial to trial, the 
probability of this sample point is 


y „n—y 


ррррр >+: PPP 444::: qd = p'q 
SS m M 
y terms n — y terms 


Every other sample point in the event Y = y can be represented as an n-tuple 
containing y S’s and (n — y) F’s in some order. Any such sample point also has 
probability p?q" ^. Because the number of distinct n-tuples that contain y S's and 
(n — y) F’s is (from Theorem 2.3) 


(") _ n! 
y) у!@—у)!” 


it follows that the event (Y = y) is made up of (") sample points, each with probability 


pq" ?, and that p(y) = (") pq’, y = 0, 1, 2,..., n. The result that we have 
just derived is the formula for the binomial probability distribution. 


A random variable Y is said to have a binomial distribution based on п trials 
with success probability p if and only if 


ро) = с у= 0, 1, 2,...,п and 0€ p x I. 
y 


Figure 3.4 portrays p(y) graphically as probability histograms, the first for n = 10, 
р = .1; the second for п = 10, р = .5; and the third for n = 20, p = .5. Before we 
proceed, let us reconsider the representation for the sample points in this experiment. 
We have seen that a sample point can be represented by a sequence of n letters, each 
of which is either S or F. If the sample point contains exactly one S, the probabil- 
ity associated with that sample point is pq"-!. If another sample point contains 2 
S’s—and (n — 2) F’s—the probability of this sample point is p?q"—7. Notice that the 
sample points for a binomial experiment are not equiprobable unless p = .5. 

The term binomial experiment derives from the fact each trial results in one of two 
possible outcomes and that the probabilities p(y), y = 0, 1, 2,..., n, are terms of 
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FIGURE 3.4 
Binomial probability 
histograms 


p(y) 


the binomial expansion 


n n n n 1 -n-1 n 2 n—2 n n 
(q +p)" = q + pq + pu cese p. 
0 1 2 п 


You will observe that (5)q” = p(0), (7) p'q"^* = p(1), and, in general, p(y) = 
(") р”9" >. К аїѕо follows that p(y) satisfies the necessary properties for a probability 
function because p(y) is positive for у = 0, 1, ..., and [because (q + p) = 1] 


т п {=F n n 
Уро) = У; (lz ер) =P 21, 
y 0 wv 


у= 
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The binomial probability distribution has many applications because the binomial 
experiment occurs in sampling for defectives in industrial quality control, in the 
sampling of consumer preference or voting populations, and in many other physical 
situations. We will illustrate with a few examples. Other practical examples will 
appear in the exercises at the end of this section and at the end of the chapter. 


EXAMPLE 3.7 


Solution 


Suppose that a lot of 5000 electrical fuses contains 5% defectives. If a sample of 
5 fuses is tested, find the probability of observing at least one defective. 


It is reasonable to assume that Y, the number of defectives observed, has an approx- 
imate binomial distribution because the lot is large. Removing a few fuses does not 
change the composition of those remaining enough to cause us concern. Thus, 


5 
P(at least one defective) = 1 — p(0) = 1 — (5) p^q? 


= 1 — (.95) = 1 — .774 = 226. 


Notice that there is a fairly large chance of seeing at least one defective, even though 
the sample is quite small. L1 


EXAMPLE 3.8 


Solution 


Experience has shown that 3096 of all persons afflicted by a certain illness recover. 
A drug company has developed a new medication. Ten people with the illness were 
selected at random and received the medication; nine recovered shortly thereafter. 
Suppose that the medication was absolutely worthless. What is the probability that at 
least nine of ten receiving the medication will recover? 


Let Y denote the number of people who recover. If the medication is worthless, the 
probability that a single ill person will recover is p — .3. Then the number of trials is 
n — 10 and the probability of exactly nine recoveries is 


Р(Ү = 9) = р(9) = MIX — .000138. 
Similarly, the probability of exactly ten recoveries is 
P(Y = 10) = р(10) = (10) coc? — .000006, 
and 


P(Y > 9) = p(9) + p(10) = .000138 + .000006 = .000144. 


If the medication is ineffective, the probability of observing at least nine recoveries is 
extremely small. If we administered the medication to ten individuals and observed at 
least nine recoveries, then either (1) the medication is worthless and we have observed 
a rare event or (2) the medication is indeed useful in curing the illness. We adhere to 
conclusion 2. E 
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A tabulation of binomial probabilities in the form Ў ^ p(y), presented in Table 1, 
Appendix 3, will greatly reduce the computations for some of the exercises. The 
references at the end of the chapter list several more extensive tabulations of binomial 
probabilities. Due to practical space limitations, printed tables typically apply for only 
selected values of n and p. Binomial probabilities can also be found using various 
computer software packages. If Y has a binomial distribution based on и trials with 
success probability p, P(Y = yo) = р(уо) can be found by using the А (or S- 
Plus) command dbinom(yo,n,p), whereas P(Y < yo) is found by using the А 
(or S-Plus) command pbinon (уо, n, p). A distinct advantage of using software to 
compute binomial probabilities is that (practically) any values for n and p can be 
used. We illustrate the use of Table 1 (and, simultaneously, the use of the output of 
the R command pbinom(yo,n,p))in the following example. 


EXAMPLE 3.9 


Solution 


The large lot of electrical fuses of Example 3.7 is supposed to contain only 596 
defectives. If n = 20 fuses are randomly sampled from this lot, find the probability 
that at least four defectives will be observed. 


Letting Y denote the number of defectives in the sample, we assume the binomial 
model for Y, with p — .05. Thus, 


PY >4)=1- P(Y <3), 


and using Table 1, Appendix 3 [or the R command pbinom(3,20,.05)], we 
obtain 


3 
P(Y <3) = У p(y) = 984. 
›=0 


The value .984 is found in the table labeled n = 20in Table 1, Appendix 3. Specifically, 
it appears in the column labeled p = .05 and in the row labeled a = 3. It follows 
that 


P(Y > 4) = 1 — .984 = .016. 


This probability is quite small. If we did indeed observe more than three defectives 
out of 20 fuses, we might suspect that the reported 5% defect rate is erroneous. Ё 


The mean and variance associated with a binomial random variable are derived in 
the following theorem. As you will see in the proof of the theorem, it is necessary to 
evaluate the sum of some arithmetic series. In the course of the proof, we illustrate 
some of the techniques that are available for summing such series. In particular, we 
use the fact that У, p(y) = 1 for any discrete random variable. 
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THEOREM 3.7 Let Y be a binomial random variable based on п trials and success probability 
p. Then 


w= EY)=np and о? = WO ) = т. 


Proof By Definitions 3.4 and 3.7, 


I 2,900) = 2 (re 


Notice that the first term in the sum is 0 and hence that 


Е(Ү п=у 
с т т п ууу n 


Уу n! y ,n—y 
t= DG = 


у=1 


The summands in this last expression bear a striking resemblance to binomial 
probabilities. In fact, if we factor np out of each term in the sum and letz = y—1, 


< (n — 1)! yell 
JUL OS = ; 


п—1 
(ОАЕ 
2:22 jo of == 


(n — 1— z)lz! 


-»»(u ') pat I5 


Notice that p(z) — (с )р“а"—!1—© is the binomial probability function based 
on (n — 1) trials. Thus, > p(z) = 1, and it follows that 


[I = 12107) = р 


From Theorem 3.6, we know that o? = V (Y) = E(Y?) — и?. Thus, o? can be 
calculated if we find E(Y?). Finding E(Y 2) directly is difficult because 


EAQUE sen) = Шо 74" = аз T ar = — pou 
у=0 y=0 


and the quantity y? does not appear as a factor of y!. Where do we go from 
here? Notice that 


E[Y(Y = D] = E(Y? = Y) = E(Y”) = E(Y) 
and, therefore, 


EY’) = E[Y(Y - D]-- EY) = EIY(Y - D] +u 
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In this case, 


£ ! 
E[Y(Y - D] = `X О eg 


ver уп — y)! 
The first and second terms of this sum equal zero (when y = 0 and y = 1). 
Then 
E — => с "gr 
SS FeLi Ee 
О Жш — 


у=2 
(Notice the cancellation that led to this last result. The anticipation of this 
cancellation is what actually motivated the consideration of E[Y(Y — 1)].) 
Again, the summands in the last expression look very much like binomial 
probabilities. Factor n(n — 1) p? out of each term in the sum and let z = y — 2 
to obtain 


2 (n — 2)! 
EYO — D)] = EE —— 
YY- D] 2 п(п—1)р 216-316 ^ 


y-2 ,n—y 


n—2 
(0 = 2) "S 
=л(п—1)р У рі"? 


4 2100—22 — 2)! 


we 


n—2 
n—2 В 
= п(п – 0р? ) ( р 


z= K 


n—2 


—7 


Again note that p(z) = ( А ) pq" ?-* is the binomial probability function 
based on (n — 2) trials. Then Mm p(z) = 1 (again using the device illustrated 
in the derivation of the mean) and 


E[Y(Y = D] = n(n = Dp?. 
Thus, 
Е(Ү?) = Е[Ү(Ү — D]-- u 2 n(n — Dp? +np 
and 
о? = Е(Ү?) — Ve = n(n — Dp? + пр— np 
= np|(n — l)p + 1—np] = np(1 — p) = npq. 


In addition to providing formulas for the mean and variance of a binomial random 
variable, the derivation of Theorem 3.7 illustrates the use of two fairly common tricks, 
namely, to use the fact that $^ p(y) = 1 if p(y) is a valid probability function and to 
find E(Y?) by finding E[Y (Y — 1)]. These techniques also will be useful in the next 
sections where we consider other discrete probability distributions and the associated 
means and variances. 

A frequent source of error in applying the binomial probability distribution to 
practical problems is the failure to define which of the two possible results of a trial 
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is the success. As a consequence, q may be used erroneously in place of p. Carefully 
define a success and make certain that p equals the probability of a success for each 
application. 

Thus far in this section we have assumed that the number of trials, n, and the 
probability of success, p, were known, and we used the formula p(y) — (") pq"? to 
compute probabilities associated with binomial random variables. In Example 3.8 we 
obtained a value for P(Y > 9) and used this probability to reach a conclusion about 
the effectiveness of the medication. The next example exhibits another statistical, 
rather than probabilistic, use of the binomial distribution. 


EXAMPLE 3.10 


Solution 


Suppose that we survey 20 individuals working for a large company and ask each 
whether they favor implementation of a new policy regarding retirement funding. If, 
in our sample, 6 favored the new policy, find an estimate for p, the true but unknown 
proportion of employees that favor the new policy. 


If Y denotes the number among the 20 who favor the new policy, it is reasonable 
to conclude that Y has a binomial distribution with n = 20 for some value of p. 
Whatever the true value for p, we conclude that the probability of observing 6 out of 
20 in favor of the policy is 


Р(Ү =6)= (i^a = p)“. 


We will use as our estimate for р the value that maximizes the probability of observing 
the value that we actually observed (6 in favor in 20 trials). How do we find the value 
of p that maximizes P(Y = 6)? 

Because (2) is a constant (relative to p) and In(w) is an increasing function of w, 
the value of p that maximizes P(Y — 6) — ~) pea — p is the same as the value 
of p that maximizes In[ pé(1 — p^] = [61n(p) + 1415(1 — p)]. 

If we take the derivative of [6 In(p) + 141In(1 — p)] with respect to p, we obtain 


ш шы. ( 14 ) 
dp 5; 1-р’ 


The value of p that maximizes (or minimizes) [61n(p) + 14In(1 — p)] [and, more 
important, P(Y — 6)] is the solution to the equation 


Solving, we obtain p — 6/20. 

Because the second derivative of [61n(p) + 1415(1 — p)] is negative when р = 
6/20, it follows that [6In(p) + 14In(1 — p)] [and P(Y = 6)] is maximized when 
p — 6/20. Our estimate for p, based on 6 "successes" in 20 trials is therefore 6/20. 

The ultimate answer that we obtained should look very reasonable to you. Because 
р is ће probability of a "success" on any given trial, a reasonable estimate is, indeed, 
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the proportion of "successes" in our sample, in this case 6/20. In the next section, we 
will apply this same technique to obtain an estimate that is not initially so intuitive. As 
we will see in Chapter 9, the estimate that we just obtained is the maximum likelihood 
estimate for p and the procedure used above is an example of the application of the 
method of maximum likelihood. Es] 
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Exercises 


Consider the population of voters described in Example 3.6. Suppose that there are N — 5000 
voters in the population, 4096 of whom favor Jones. Identify the event favors Jones as a 
success 5. It is evident that the probability of S on trial 1 is .40. Consider the event B that S 
occurs on the second trial. Then B can occur two ways: The first two trials are both succes- 
ses or the first trial is a failure and the second is a success. Show that P(B) — .4. What is 
P(B| the first trial is $)? Does this conditional probability differ markedly from P (B)? 


The manufacturer of a low-calorie dairy drink wishes to compare the taste appeal of a new 
formula (formula B) with that of the standard formula (formula A). Each of four judges is given 
three glasses in random order, two containing formula A and the other containing formula B. 
Each judge is asked to state which glass he or she most enjoyed. Suppose that the two formulas 
are equally attractive. Let Y be the number of judges stating a preference for the new formula. 


a Find the probability function for У. 


b What is the probability that at least three of the four judges state a preference for the new 
formula? 


с Find the expected value of Y. 


d Find the variance of Y. 


In 2003, the average combined SAT score (math and verbal) for college-bound students in the 
United States was 1026. Suppose that approximately 4596 of all high school graduates took 
this test and that 100 high school graduates are randomly selected from among all high school 
grads in the United States. Which of the following random variables has a distribution that can 
be approximated by a binomial distribution? Whenever possible, give the values for n and p. 


The number of students who took the SAT 
The scores of the 100 students in the sample 
The number of students in the sample who scored above average on the SAT 


The amount of time required by each student to complete the SAT 


oc сто c » 


The number of female high school grads in the sample 


a Ameteorologist in Denver recorded Y — the number of days of rain during a 30-day period. 
Does Y have a binomial distribution? If so, are the values of both n and p given? 


b А market research firm has hired operators who conduct telephone surveys. A computer 
is used to randomly dial a telephone number, and the operator asks the answering person 
whether she has time to answer some questions. Let Y — the number of calls made until the 
first person replies that she is willing to answer the questions. Is this a binomial experiment? 
Explain. 
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A complex electronic system is built with a certain number of backup components in its 
subsystems. One subsystem has four identical components, each with a probability of .2 of 
failing in less than 1000 hours. The subsystem will operate if any two of the four components 
are operating. Assume that the components operate independently. Find the probability that 


a exactly two of the four components last longer than 1000 hours. 
b the subsystem operates longer than 1000 hours. 


The probability that a patient recovers from a stomach disease is .8. Suppose 20 people are 
known to have contracted this disease. What is the probability that 


exactly 14 recover? 


at least 10 recover? 


a C 9 


at least 14 but not more than 18 recover? 


d atmost 16 recover? 


A multiple-choice examination has 15 questions, each with five possible answers, only one of 
which is correct. Suppose that one of the students who takes the examination answers each of 
the questions with an independent random guess. What is the probability that he answers at 
least ten questions correctly? 


Refer to Exercise 3.41. What is the probability that a student answers at least ten questions 
correctly if 


a foreach question, the student can correctly eliminate one of the wrong answers and sub- 
sequently answers each of the questions with an independent random guess among the 
remaining answers? 

b he can correctly eliminate two wrong answers for each question and randomly chooses 
from among the remaining answers? 


Many utility companies promote energy conservation by offering discount rates to consumers 
who keep their energy usage below certain established subsidy standards. A recent EPA report 
notes that 70% of the island residents of Puerto Rico have reduced their electricity usage 
sufficiently to qualify for discounted rates. If five residential subscribers are randomly selected 
from San Juan, Puerto Rico, find the probability of each of the following events: 


a АП five qualify for the favorable rates. 
b Atleast four qualify for the favorable rates. 


A new surgical procedure is successful with a probability of p. Assume that the operation is 
performed five times and the results are independent of one another. What is the probability 
that 


a all five operations are successful if p = .8? 
b exactly four are successful if p — .6? 


C less than two are successful if p = .3? 


A fire-detection device utilizes three temperature-sensitive cells acting independently of each 
other in such a manner that any one or more may activate the alarm. Each cell possesses a 
probability of p = .8 of activating the alarm when the temperature reaches 100° Celsius or 
more. Let Y equal the number of cells activating the alarm when the temperature reaches 100°. 


a Find the probability distribution for Y. 


b Find the probability that the alarm will function when the temperature reaches 100°. 
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Construct probability histograms for the binomial probability distributions for n — 5, p — .1, 
.5, and .9. (Table 1, Appendix 3, will reduce the amount of calculation.) Notice the symmetry 
for p — .5 and the direction of skewness for p — .1 and .9. 


Use Table 1, Appendix 3, to construct a probability histogram for the binomial probability 
distribution for n — 20 and p — .5. Notice that almost all the probability falls in the interval 
5 < у < 15. 


In Exercise 2.151, you considered a model for the World Series. Two teams A and B play a series 
of games until one team wins four games. We assume that the games are played independently 
and that the probability that A wins any game is p. Compute the probability that the series lasts 
exactly five games. [Hint: Use what you know about the random variable, Y, the number of 
games that A wins among the first four games.] 


Tay-Sachs disease is a genetic disorder that is usually fatal in young children. If both parents are 
carriers of the disease, the probability that their offspring will develop the disease is approxi- 
mately .25. Suppose that a husband and wife are both carriers and that they have three children. 
If the outcomes of the three pregnancies are mutually independent, what are the probabilities 
of the following events? 


a Allthree children develop Tay-Sachs. 
b Only one child develops Tay-Sachs. 
c The third child develops Tay-Sachs, given that the first two did not. 


A missile protection system consists of п radar sets operating independently, each with a 
probability of .9 of detecting a missile entering a zone that is covered by all of the units. 


a Ёл = 5 anda missile enters the zone, what is the probability that exactly four sets detect 
the missile? At least one set? 


b How large must п be if we require that the probability of detecting a missile that enters the 
zone be .999? 


In the 18th century, the Chevalier de Mere asked Blaise Pascal to compare the probabilities of 
two events. Below, you will compute the probability of the two events that, prior to contrary 
gambling experience, were thought by de Mere to be equally likely. 


a What is the probability of obtaining at least one 6 in four rolls of a fair die? 
b Ifa pair of fair dice is tossed 24 times, what is the probability of at least one double six? 


The taste test for PTC (phenylthiocarbamide) is a favorite exercise in beginning human genetics 
classes. It has been established that a single gene determines whether or not an individual is a 
“taster.” If 70% of Americans are “tasters” and 20 Americans are randomly selected, what is 
the probability that 


a at least 17 are “tasters”? 


b fewer than 15 are “tasters”? 


A manufacturer of floor wax has developed two new brands, A and B, which she wishes to 
subject to homeowners’ evaluation to determine which of the two is superior. Both waxes, 
A and B, are applied to floor surfaces in each of 15 homes. Assume that there is actually no 
difference in the quality of the brands. What is the probability that ten or more homeowners 
would state a preference for 


a brand A? 
b either brand A or brand B? 
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Suppose that У is a binomial random variable based on n trials with success probability p and 
consider Y* — n — Y. 


а Argue that for y* = 0, 1,...,п 
P(Y* = y) = P(n— Y = y) = P(Y 2n-— y). 


b Use the result from part (a) to show that 


Р(Ү* = у= п п-у* yt (hn yt пу? 
(Ү*=у')= „|Р ^q =| „|Р. 
п- у y 


с The result in part (b) implies that Y* has a binomial distribution based on n trials and 
“success” probability p* = 4 = 1 — p. Why is this result “obvious”? 


Suppose that Y is a binomial random variable with n > 2 trials and success probability p. 
Use the technique presented in Theorem 3.7 and the fact that E{Y (Y — 1)(Y —2)) = E(Y 2 es 
3E(Y?) + 2E(Y) to derive E(Y?). 


An oil exploration firm is formed with enough capital to finance ten explorations. The probabil- 
ity of a particular exploration being successful is .1. Assume the explorations are independent. 
Find the mean and variance of the number of successful explorations. 


Refer to Exercise 3.56. Suppose the firm has a fixed cost of $20,000 in preparing equipment prior 
to doing its first exploration. If each successful exploration costs $30,000 and each unsuccessful 
exploration costs $15,000, find the expected total cost to the firm for its ten explorations. 


A particular concentration of a chemical found in polluted water has been found to be lethal to 
20% of the fish that are exposed to the concentration for 24 hours. Twenty fish are placed in a 
tank containing this concentration of chemical in water. 


Find the probability that exactly 14 survive. 
Find the probability that at least 10 survive. 
Find the probability that at most 16 survive. 


ana C» 


Find the mean and variance of the number that survive. 


Ten motors are packaged for sale in a certain warehouse. The motors sell for $100 each, but a 
double-your-money-back guarantee is in effect for any defectives the purchaser may receive. 
Find the expected net gain for the seller if the probability of any one motor being defective is 
.08. (Assume that the quality of any one motor is independent of that of the others.) 


A particular sale involves four items randomly selected from a large lot that is known to contain 
1096 defectives. Let Y denote the number of defectives among the four sold. The purchaser of 
the items will return the defectives for repair, and the repair cost is given by C = 3Y? + Y +2. 
Find the expected repair cost. [Hint: The result of Theorem 3.6 implies that, for any random 
variable Y, E(Y?) = o? + и?.] 


Of the volunteers donating blood in a clinic, 8096 have the Rhesus (Rh) factor present in their 
blood. 


a If five volunteers are randomly selected, what is the probability that at least one does not 
have the Rh factor? 

b If five volunteers are randomly selected, what is the probability that at most four have the 
Rh factor? 

c Whatis the smallest number of volunteers who must be selected if we want to be at least 
90% certain that we obtain at least five donors with the Rh factor? 
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Goranson and Hall (1980) explain that the probability of detecting a crack in an airplane wing 
is the product of pi, the probability of inspecting a plane with a wing crack; p», the probability 
of inspecting the detail in which the crack is located; and рз, the probability of detecting the 
damage. 


a What assumptions justify the multiplication of these probabilities? 


b Suppose p; = .9, p; = .8, and рз = .5 for a certain fleet of planes. If three planes аге 
inspected from this fleet, find the probability that a wing crack will be detected on at least 
one of them. 


Consider the binomial distribution with n trials and P(S) = р. 


— 1 
a Show that PO) = po DP for y = 1,2,...,n. Equivalently, for у = 
р(у — yq 
А (п =у+ 10р А : WE 
1,2, ...,n, the equation p(y) = p(y — 1) gives a recursive relationship 
yq 


between the probabilities associated with successive values of Y. 
b лп = 90and p = .04, use the above relationship to find P(Y < 3). 
— 1 
PO) = andr ЮР > lif y < (п + 1)p, that PO) 
p(y—- 1) yq р(у—1) 


(n+ 1) p, and that OT = lif (n+1)pisaninteger and y = (п + 1) p. This establishes 
P 


c Show that 


<lify> 


that p(y) > p(y — 1) if y is small (y < (п + 1)p) and p(y) < p(y — 1) if y is large 
(у > (n+ 1) p). Thus, successive binomial probabilities increase for a while and decrease 
from then on. 

d Show that the value of y assigned the largest probability is equal to the greatest integer less 
than or equal to (n + 1) p. If (n + 1)p = m for some integer т, then р(т) = p(m — 1). 


Consider an extension of the situation discussed in Example 3.10. If there are n trials in a 
binomial experiment and we observe yo “successes,” show that P(Y = yo) is maximized when 
р = yo/n. Again, we are determining (in general this time) the value of p that maximizes the 
probability of the value of Y that we actually observed. 


Refer to Exercise 3.64. The maximum likelihood estimator for p is Y/n (note that Y is the 
binomial random variable, not a particular value of it). 


a Derive E(Y/n). In Chapter 9, we will see that this result implies that Y/n is an unbiased 
estimator for p. 


b Derive V (Y/n). What happens to V (Y/n) as n gets large? 


The Geometric Probability Distribution 


The random variable with the geometric probability distribution is associated with 
an experiment that shares some of the characteristics of a binomial experiment. This 
experiment also involves identical and independent trials, each of which can result in 
one of two outcomes: success or failure. The probability of success is equal to p and 
is constant from trial to trial. However, instead of the number of successes that occur 
in n trials, the geometric random variable Y is the number of the trial on which the 
first success occurs. Thus, the experiment consists of a series of trials that concludes 
with the first success. Consequently, the experiment could end with the first trial if a 
success is observed on the very first trial, or the experiment could go on indefinitely. 


DEFINITION 3.8 


FIGURE 3.5 
The geometric 
probability 
distribution, p = .5 
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The sample space S for the experiment contains the countably infinite set of sample 
points: 


E: S (success on first trial) 
E»: FS (failure on first, success on second) 
Ез: FFS (first success on the third trial) 
E4: FFFS (first success on the fourth trial) 
Ez: FFFF...FS (first success on the k" trial) 

k-1 


Because the random variable Y is the number of trials up to and including the first 
success, the events (Y = 1), (Y = 2), and (Y = 3) contain only the sample points 
E, E», and Ез, respectively. More generally, the numerical event (Y = y) contains 
only E,. Because the trials are independent, for any y = 1, 2, 3,..., 


p(y) = PO = y) = P(Ey) = Р(ЕЕЕЕ...Е S) -qqq:-qp-—q' p. 


yl y-l 


A random variable Y is said to have a geometric probability distribution if and 
only if 


р(у) = 4р, о OS p2 Íl 


A probability histogram for p(y), p = .5, is shown in Figure 3.5. Areas over 
intervals correspond to probabilities, as they did for the frequency distributions of 
data in Chapter 1, except that Y can assume only discrete values, у = 1, 2,..., oo. 
That p(y) > 0 is obvious by inspection of the respective values. In Exercise 3.66 
you will show that these probabilities add up to 1, as is required for any valid discrete 
probability distribution. 


p(y) 
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The geometric probability distribution is often used to model distributions of 
lengths of waiting times. For example, suppose that a commercial aircraft engine 
is serviced periodically so that its various parts are replaced at different points in 
time and hence are of varying ages. Then the probability of engine malfunction, p, 
during any randomly observed one-hour interval of operation might be the same as 
for any other one-hour interval. The length of time prior to engine malfunction is 
the number of one-hour intervals, Y , until the first malfunction. (For this application, 
engine malfunction in a given one-hour period is defined as a success. Notice that, as 
in the case of the binomial experiment, either of the two outcomes of a trial can be 
defined as a success. Again, a “success” is not necessarily what would be considered 
to be “good” in everyday conversation.) 


EXAMPLE 3.11 


Solution 


Suppose that the probability of engine malfunction during any one-hour period is 
p — .02. Find the probability that a given engine will survive two hours. 


Letting Y denote the number of one-hour intervals until the first malfunction, we have 


oo 
P (survive two hours) = P(Y > 3) = у, pO»). 
y-3 


оо 
Because У) p(y) = 1, 
у=1 


2 
P(survive two hours) = 1 — у, p(y) 


у=1 


= 1— p —qp = 1 — .02 — (.98)(.02) = .9604. M 


THEOREM 3.8 


Proof 


If you examine the formula for the geometric distribution given in Definition 3.8, 
you will see that larger values of p (and hence smaller values of q) lead to higher 
probabilities for the smaller values of Y and hence lower probabilities for the larger 
values of Y. Thus, the mean value of Y appears to be inversely proportional to p. 
As we show in the next theorem, the mean of a random variable with a geometric 
distribution is actually equal to 1/p. 


If Y is a random variable with a geometric distribution, 


1 = 
ш= Е(Ү) = — and о? = V(Y)2 — 
р р 


оо оо 
а= м р=юў ж. 


у=1 у=1 


EXAMPLE 3.12 


Solution 
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This series might seem to be difficult to sum directly. Actually, it can be summed 
easily if we take into account that, for y > 1, 


d Ууу у—1 
dq! J=yq, 


d oo oo 
sperm 


у= у=1 


and, hence, 


(The interchanging of derivative and sum here can be justified.) Substituting, 
we obtain 


oo d oo 
POLOS =ў E (Èe) . 
у=! 4 VE 


The latter sum is the geometric series, q + q? + q? +---, which is equal to 
q/(1 — q) (see Appendix A1.11). Therefore, 


d q 1 p 1 
S Р dq WES) "mm pop 


To summarize, our approach is to express a series that cannot be summed 
directly as the derivative of a series for which the sum can be readily obtained. 
Once we evaluate the more easily handled series, we differentiate to complete 
the process. 

The derivation of the variance is left as Exercise 3.85. 


If the probability of engine malfunction during any one-hour period is p — .02 and 
Y denotes the number of one-hour intervals until the first malfunction, find the mean 
and standard deviation of У. 


As in Example 3.11, it follows that Y has a geometric distribution with p — .02. 
Thus, E(Y) = 1/p = 1/(.02) = 50, and we expect to wait quite a few hours before 
encountering a malfunction. Further, V(Y) — .98/.0004 — 2450, and it follows that 
the standard deviation of Y is о = 4/2450 = 49.497. 0 


Although the computation of probabilities associated with geometric random vari- 
ables can be accomplished by evaluating a single value or partial sums associated with 
a geometric series, these probabilities can also be found using various computer soft- 
ware packages. If Y has a geometric distribution with success probability p, P(Y = 
yo) = p(yo) can be found by using ће R (or S-Plus) command dgeom(yo-1,p), 
whereas P(Y < yo) is found by using the А (or S-Plus) command pgeom (yo-1, p). 
For example, the R (or S-Plus) command pgeom(1,0.02) yields the value for 
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EXAMPLE 3.13 


Solution 


P(Y x 2) that was implicitly used in Example 3.11. Note that the argument in these 
commands is the value yo — 1, not the value yo. This is because some authors prefer to 
define the geometric distribution to be that of the random variable Y* — the number of 
failures before the first success. In our formulation, the geometric random variable Y 
is interpreted as the number of the trial on which the first success occurs. In Exercise 
3.88, you will see that Y* = Y — 1. Due to this relationship between the two versions of 
geometric random variables, P(Y = yo) = P(Y — 1 = yo—1) = P(Y* = w—1).R 
computes probabilities associated with Y*, explaining why the arguments for dgeom 
and pgeom are yo — | instead of yo. 

The next example, similar to Example 3.10, illustrates how knowledge of the 
geometric probability distribution can be used to estimate an unknown value of p, 
the probability of a success. 


Suppose that we interview successive individuals working for the large company 
discussed in Example 3.10 and stop interviewing when we find the first person who 
likes the policy. If the fifth person interviewed is the first one who favors the new 
policy, find an estimate for p, the true but unknown proportion of employees who 
favor the new policy. 


If Y denotes the number of individuals interviewed until we find the first person who 
likes the new retirement plan, it is reasonable to conclude that Y has a geometric 
distribution for some value of p. Whatever the true value for p, we conclude that the 
probability of observing the first person in favor of the policy on the fifth trial is 


PY = 5) = (1— йўр 


We will use as our estimate for р the value that maximizes the probability of observing 
the value that we actually observed (the first success on trial 5). 

To find the value of p that maximizes P(Y = 5), we again observe that the value 
of p that maximizes P(Y = 5) = (1 — p)*p is the same as the value of p that 
maximizes In[(1 — p)*pl = [4In(1 — p) + In(p)]. 

If we take the derivative of [4In(1 — p) + In(p)] with respect to p, we obtain 


d[4in(i—p)+in(p)) | —4 1 
= “Б 
dp l-p p 


Setting this derivative equal to 0 and solving, we obtain p = 1/5. 

Because the second derivative of [4 In(1 — р) + In(p)] is negative when р = 1/5, 
it follows that [4In(1 — p) + In(p)] [and P(Y = 5)] is maximized when p = 1/5. 
Our estimate for p, based on observing the first success on the fifth trial is 1/5. 

Perhaps this result is a little more surprising than the answer we obtained in 
Example 3.10 where we estimated p on the basis of observing 6 in favor of the new 
plan in a sample of size 20. Again, this is an example of the use of the method of 
maximum likelihood that will be studied in more detail in Chapter 9. О 
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Exercises 


Suppose that Y is a random variable with a geometric distribution. Show that 


a >, PQ) = Dep = 1. 
b po») 


р(у— 1) 
ric probabilities are monotonically decreasing as a function of у. If Y has a geometric 


distribution, what value of Y is the most likely (has the highest probability)? 


= q, for y = 2, 3,.... This ratio is less than 1, implying that the geomet- 


Suppose that 30% of the applicants for a certain industrial job possess advanced training in com- 
puter programming. Applicants are interviewed sequentially and are selected at random from 
the pool. Find the probability that the first applicant with advanced training in programming 
is found on the fifth interview. 


Refer to Exercise 3.67. What is the expected number of applicants who need to be interviewed 
in order to find the first one with advanced training? 


About six months into George W. Bush's second term as president, a Gallup poll indicated that 
a near record (low) level of 41% of adults expressed “а great deal" or “quite a lot" of confidence 
in the U.S. Supreme Court (http://www.gallup.com/poll/content/default.aspx?ci=17011, June 
2005). Suppose that you conducted your own telephone survey at that time and randomly called 
people and asked them to describe their level of confidence in the Supreme Court. Find the 
probability distribution for Y, the number of calls until the first person is found who does not 
express “a great deal" or “quite a lot" of confidence in the U.S. Supreme Court. 


An oil prospector will drill a succession of holes in a given area to find a productive well. The 
probability that he is successful on a given trial is .2. 
a What is the probability that the third hole drilled is the first to yield a productive well? 
b If the prospector can afford to drill at most ten wells, what is the probability that he will 

fail to find a productive well? 
Let Y denote a geometric random variable with probability of success p. 
a Show that for a positive integer a, 

P(Y > a) = д". 
b Show that for positive integers a and b, 
P(Y >а+Ь|Ү > а) 2 а? = P(Y > b). 
This result implies that, for example, P(Y > 7|Y > 2) = P(Y > 5). Why do you think 


this property is called the memoryless property of the geometric distribution? 


c In the development of the distribution of the geometric random variable, we assumed 
that the experiment consisted of conducting identical and independent trials until the first 
success was observed. In light of these assumptions, why is the result in part (b) *obvious"? 


Given that we have already tossed a balanced coin ten times and obtained zero heads, what is 


the probability that we must toss it at least two more times to obtain the first head? 


A certified public accountant (CPA) has found that nine of ten company audits contain sub- 
stantial errors. If the CPA audits a series of company accounts, what is the probability that the 
first account containing substantial errors 

a isthe third one to be audited? 


b will occur on or after the third audited account? 
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Refer to Exercise 3.73. What аге the mean and standard deviation of the number of accounts 
that must be examined to find the first one with substantial errors? 


The probability of a customer arrival at a grocery service counter in any one second is equal 
to .1. Assume that customers arrive in a random stream and hence that an arrival in any one 
second is independent of all others. Find the probability that the first arrival 


a will occur during the third one-second interval. 

b will not occur until at least the third one-second interval. 

Of a population of consumers, 6096 are reputed to prefer a particular brand, A, of toothpaste. 
If a group of randomly selected consumers is interviewed, what is the probability that exactly 


five people have to be interviewed to encounter the first consumer who prefers brand A? At 
least five people? 


If Y has a geometric distribution with success probability p, show that 


P(Y = an odd integer ) = 1 E 


If Y has a geometric distribution with success probability .3, what is the largest value, yo, such 
that P(Y > yo) 2 .1? 


How many times would you expect to toss a balanced coin in order to obtain the first head? 


Two people took turns tossing a fair die until one of them tossed a 6. Person A tossed first, B 
second, A third, and so on. Given that person B threw the first 6, what is the probability that B 
obtained the first 6 on her second toss (that is, on the fourth toss overall)? 


In responding to a survey question on a sensitive topic (such as “Have you ever tried 
marijuana?"), many people prefer not to respond in the affirmative. Suppose that 8096 of 
the population have not tried marijuana and all of those individuals will truthfully answer no 
to your question. The remaining 20% of the population have tried marijuana and 70% of those 
individuals will lie. Derive the probability distribution of Y, the number of people you would 
need to question in order to obtain a single affirmative response. 


Refer to Exercise 3.70. The prospector drills holes until he finds a productive well. How many 
holes would the prospector expect to drill? Interpret your answer intuitively. 


The secretary in Exercises 2.121 and 3.16 was given n computer passwords and tries the 
passwords at random. Exactly one of the passwords permits access to a computer file. Suppose 
now that the secretary selects a password, tries it, and—if it does not work—puts it back in 
with the other passwords before randomly selecting the next password to try (not a very clever 
secretary!). What is the probability that the correct password is found on the sixth try? 


Refer to Exercise 3.83. Find the mean and the variance of Y, the number of the trial on which 
the correct password is first identified. 


Find E[Y (Y — 1)] for a geometric random variable Y by finding dg (Oe, Ф). Use this 
result to find the variance of Y. 


Consider an extension of the situation discussed in Example 3.13. If we observe yo as the value 
for a geometric random variable Y , show that P(Y = yo) is maximized when р = 1/yo. Again, 
we are determining (in general this time) the value of p that maximizes the probability of the 
value of Y that we actually observed. 


*З.87 


*З.88 
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Refer to Exercise 3.86. The maximum likelihood estimator for p is 1/Y (note that Y is the 
geometric random variable, not a particular value of it). Derive E(1/Y). [Hint: If |r| < 1, 


Ys r/i = —Ш(1—).] 


If Y is a geometric random variable, define Y* = Y — 1. If Y is interpreted as the number of 
the trial on which the first success occurs, then Y* can be interpreted as the number of failures 
before the first success. If Y* = Y — 1, P(Y* = у) = P(Y — 1 = y) = P(Y = y + 1) for 
y = 0, 1, 2,.... Show that 


P(Y* = у) = gp, VS 0515254295 


The probability distribution of Y* is sometimes used by actuaries as a model for the distribution 
of the number of insurance claims made in a specific time period. 


Refer to Exercise 3.88. Derive the mean and variance of the random variable Y* 


a by using the result in Exercise 3.33 and the relationship Y* = Y — 1, where Y is geometric. 


*b directly, using the probability distribution for Y* given in Exercise 3.88. 


The Negative Binomial Probability 
Distribution (Optional) 


A random variable with a negative binomial distribution originates from a context 
much like the one that yields the geometric distribution. Again, we focus on inde- 
pendent and identical trials, each of which results in one of two outcomes: success or 
failure. The probability p of success stays the same from trial to trial. The geometric 
distribution handles the case where we are interested in the number of the trial on 
which the first success occurs. What if we are interested in knowing the number of 
the trial on which the second, third, or fourth success occurs? The distribution that 
applies to the random variable Y equal to the number of the trial on which the rth 
success occurs (r — 2, 3, 4, etc.) is the negative binomial distribution. 

The following steps closely resemble those in the previous section. Let us select 
fixed values for r and y and consider events A and B, where 


A = {the first (y — 1) trials contain (r — 1) successes] 
and 
B — (trial y results in a success]. 


Because we assume that the trials are independent, it follows that A and B are inde- 
pendent events, and previous assumptions imply that P(B) — p. Therefore, 


PO) = p(Y = y) = Р(АП В) = P(A) x P(B). 


Notice that P(A) is O if (y — 1) « (r — 1) or, equivalently, if y < r. If y > r, our 
previous work with the binomial distribution implies that 


yo r—l,y-r 
Р(А) = C- 1)» ‘gt. 
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Finally, 


=] T 
por=(’ MZ А y=r,r+i,r42,.... 


A random variable Y is said to have a negative binomial probability distribution 
if and only if 


=i й 
ро) = (? ieee wer ЧЕ ltl... 0=p = ||, 
ЕЕ 


EXAMPLE 3.14 


Solution 


A geological study indicates that an exploratory oil well drilled in a particular region 
should strike oil with probability .2. Find the probability that the third oil strike comes 
on the fifth well drilled. 


Assuming independent drillings and probability .2 of striking oil with any one well, 
let Y denote the number of the trial on which the third oil strike occurs. Then it is 
reasonable to assume that Y has a negative binomial distribution with p — .2. Because 
we are interested in r — 3 and y — 5, 


_ ey) = 4 37 Q\2 
Р(Ү = 5) = р(5) = 2 (.2)°(.8) 
= 6(.008)(.64) = .0307. ш 
Ifr = 2,3,4,... апа Y has a negative binomial distribution with success prob- 


ability p, P(Y = yo) = p(yo) can be found by using the А (or S-Plus) command 
dnbinom(yo-r,r,p). If we wanted to use R to obtain p(5) in Example 3.14, we 
use the command dnbinom(2,3, .2). Alternatively, P(Y < yo) is found by using 
the R (or S-Plus) command pnbinom(yo-r,r, p). Note that the first argument in 
these commands is the value yo — r, not the value yo. This is because some authors 
prefer to define the negative binomial distribution to be that of the random variable 
Y* — the number of failures before the rth success. In our formulation, the negative 
binomial random variable, Y, is interpreted as the number of the trial on which 
the rth success occurs. In Exercise 3.100, you will see that Y* — Y — r. Due to 
this relationship between the two versions of negative binomial random variables, 
P(Y = yo) = P(Y —r = yo — ғ) = P(Y* = yo — ғ). R computes probabilities 
associated with Y*, explaining why the arguments for dnbinom and pnbinom are 
yo — r instead of yo. 

The mean and variance of a random variable with a negative binomial distribution 
can be derived directly from Definitions 3.4 and 3.5 by using techniques like those 
previously illustrated. However, summing the resulting infinite series is somewhat 
tedious. These derivations will be much easier after we have developed some of the 
techniques of Chapter 5. For now, we state the following theorem without proof. 


ТНЕОКЕМЗ.9 
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If Y is a random variable with a negative binomial distribution, 


РО = 2) 
а 


ОЕ “and! o = Viaje 
р 


EXAMPLE 3.15 


Solution 


A large stockpile of used pumps contains 20% that are in need of repair. A maintenance 
worker is sent to the stockpile with three repair kits. She selects pumps at random and 
tests them one at a time. If the pump works, she sets it aside for future use. However, 
if the pump does not work, she uses one of her repair kits on it. Suppose that it takes 
10 minutes to test a pump that is in working condition and 30 minutes to test and 
repair a pump that does not work. Find the mean and variance of the total time it takes 
the maintenance worker to use her three repair kits. 


Let Y denote the number of the trial on which the third nonfunctioning pump is 
found. It follows that Y has a negative binomial distribution with p = .2. Thus, 
E(Y) = 3/(.2) = 15 and V(Y) = 3(.8)/(.2)° = 60. Because it takes an additional 
20 minutes to repair each defective pump, the total time necessary to use the three 
kits is 
Т = 10Ү + 3(20). 

Using the result derived in Exercise 3.33, we see that 

E(T) = 10E(Y) + 60 = 10(15) + 60 = 210 
and 


V(T) = 10°V(Y) = 100(60) = 6000. 


Thus, the total time necessary to use all three kits has mean 210 and standard deviation 


4/6000 = 77.46. 


3.90 


3.91 


3.92 


Exercises 


The employees of a firm that manufactures insulation are being tested for indications of asbestos 
in their lungs. The firm is requested to send three employees who have positive indications 
of asbestos on to a medical center for further testing. If 40% of the employees have positive 
indications of asbestos in their lungs, find the probability that ten employees must be tested in 
order to find three positives. 


Refer to Exercise 3.90. If each test costs $20, find the expected value and variance of the total 
cost of conducting the tests necessary to locate the three positives. 


Ten percent of the engines manufactured on an assembly line are defective. If engines are 
randomly selected one at a time and tested, what is the probability that the first nondefective 
engine will be found on the second trial? 


124 Chapter3 Discrete Random Variables and Their Probability Distributions 


3.93 


3.94 


3.95 


3.96 


3.97 
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Refer to Exercise 3.92. What is the probability that the third nondefective engine will be found 


a on the fifth trial? 
b onor before the fifth trial? 


Refer to Exercise 3.92. Find the mean and variance of the number of the trial on which 


a the first nondefective engine is found. 
b the third nondefective engine is found. 


Refer to Exercise 3.92. Given that the first two engines tested were defective, what is the 
probability that at least two more engines must be tested before the first nondefective is found? 


The telephone lines serving an airline reservation office are all busy about 60% of the time. 


a If you are calling this office, what is the probability that you will complete your call on the 
first try? The second try? The third try? 

b If you and a friend must both complete calls to this office, what is the probability that a 
total of four tries will be necessary for both of you to get through? 


A geological study indicates that an exploratory oil well should strike oil with probability .2. 


What is the probability that the first strike comes on the third well drilled? 
What is the probability that the third strike comes on the seventh well drilled? 


What assumptions did you make to obtain the answers to parts (a) and (b)? 


оо C $9 


Find the mean and variance of the number of wells that must be drilled if the company 
wants to set up three producing wells. 


Consider the negative binomial distribution given in Definition 3.9. 


po) _ (= 


ру—1) \у-—г 
ship between successive negative binomial probabilities, because p(y) = p(y — 1) x 


Ga) 


a Show that if y > г + 1, 


) q. This establishes a recursive relation- 


E = ] 
b Show that 2O? =- (> Ja tity < 124, simitany, PO < 1 i 
p(y- D = 1—4 р(у= 1) 
МЕЕ 
T pag 


c Apply the result in part (b) for the case r = 7, p = .5 to determine the values of y for 
which p(y) > p(y — 1). 


In a sequence of independent identical trials with two possible outcomes on each trial, $ and 
F, and with P(S) = p, what is the probability that exactly y trials will occur before the rth 
success? 


If Y is a negative binomial random variable, define Y* = Y — r. If Y is interpreted as the 
number of the trial on which the rth success occurs, then Y* can be interpreted as the number 
of failures before the rth success. 


a IfY¥*=Y-r, Р(Ү* = у) = P(Y -r=y)=P(Y = y+ r) югу = 0, 1, 2,..., show 


= 1 
that P(Y* = у) = (eas | ora’ у=0,1,2,.... 
Р 
b Derive the mean and variance of ће random variable Y* by using the relationship Y* = 
Y — r, where Y is negative binomial and the result in Exercise 3.33. 


*З.101 
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a We observe a sequence of independent identical trials with two possible outcomes on each 
trial, S and F, and with P(S) = p. The number of the trial on which we observe the fifth 
success, Y, has a negative binomial distribution with parameters r = 5 and p. Suppose 
that we observe the fifth success on the eleventh trial. Find the value of p that maximizes 
P(Y = 11). 

b Generalize the result from part (a) to find the value of p that maximizes P(Y = yo) when 
Y has a negative binomial distribution with parameters r (known) and p. 


The Hypergeometric Probability 
Distribution 


In Example 3.6 we considered a population of voters, 40% of whom favored candidate 
Jones. A sample of voters was selected, and Y (the number favoring Jones) was to be 
observed. We concluded that if the sample size n was small relative to the population 
size N, the distribution of Y could be approximated by a binomial distribution. We also 
determined that if n was large relative to N, the conditional probability of selecting 
a supporter of Jones on a later draw would be significantly affected by the observed 
preferences of persons selected on earlier draws. Thus the trials were not independent 
and the probability distribution for Y could not be approximated adequately by a 
binomial probability distribution. Consequently, we need to develop the probability 
distribution for Y when n is large relative to N. 

Suppose that a population contains a finite number N of elements that possess 
one of two characteristics. Thus, r of the elements might be red and b = N — r, 
black. A sample of n elements is randomly selected from the population, and the 
random variable of interest is Y, the number of red elements in the sample. This 
random variable has what is known as the hypergeometric probability distribution. 
For example, the number of workers who are women, Y, in Example 3.1 has the 
hypergeometric distribution. 

The hypergeometric probability distribution can be derived by using the combina- 
torial theorems given in Section 2.6 and the sample-point approach. A sample point 
in the sample space S will correspond to a unique selection of n elements, some red 
and the remainder black. As in the binomial experiment, each sample point can be 
characterized by an n-tuple whose elements correspond to a selection of n elements 
from the total of N. If each element in the population were numbered from 1 to N, the 
sample point indicating the selection of items 5, 7, 8, 64, 17,..., 87 would appear 
as the n-tuple 


(5, 7, 8, 64, 17, ..., 87). 
pr 


n positions 


The total number of sample points in S, therefore, will equal the number of ways of 
selecting a subset of n elements from a population of N, or ( ). Because random 
selection implies that all sample points are equiprobable, the probability of a sample 
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point in S is 


P(E;) = all E; є S. 


1 
(NN 
() 
The total number of sample points in the numerical event Y — y is the number 
of sample points in S that contain y red and (n — y) black elements. This num- 
ber can be obtained by applying the mn rule (Section 2.6). The number of ways 
of selecting y red elements to fill y positions in the n-tuple representing a sam- 
ple point is the number of ways of selecting y from a total of r, or C ). [We use 
the convention D = Oif b > a.] The total number of ways of selecting (n — y) 
black elements to fill the remaining (п — y) positions in the n-tuple is the number 
of ways of selecting (п — y) black elements from a possible (№ — r), or came 
Then the number of sample points in the numerical event Y = y is the number of 
ways of combining a set of у red and (л — у) black elements. By the mn rule, this 
is the product () x p. Summing the probabilities of the sample points in the 
numerical event Y — y (multiplying the number of sample points by the common 
probability per sample point), we obtain the hypergeometric probability function. 


A random variable Y is said to have a hypergeometric probability distribution 


if and only if 
tiU) 
Tm 
po) = ME 
|! 
where y is an integer 0, 1, 2,...,n, subject to the restrictions y < r and 
i= y SIN = 0. 


With the convention (x) = Oif b > a, it is clear that p(y) > 0 for the hypergeo- 
metric probabilities. The fact that the hypergeometric probabilities sum to 1 follows 


from the fact that 
У Мету  OZ£UW 
E i n-i) \п/ 


A sketch of the proof of this result is outlined in Exercise 3.216. 


EXAMPLE 3.16 


Solution 


An important problem encountered by personnel directors and others faced with the 
selection of the best in a finite set of elements is exemplified by the following scenario. 
From а group of 20 Ph.D. engineers, 10 are randomly selected for employment. What 
is the probability that the 10 selected include all the 5 best engineers in the group of 20? 


For this example N — 20, n — 10, and r — 5. That is, there are only 5 in the set of 5 
best engineers, and we seek the probability that Y — 5, where Y denotes the number 


3.7 The Hypergeometric Probability Distribution 127 


of best engineers among the ten selected. Then 


_ (бИ үлүлү a 
Р® = 25 = (эпе) ( т js ŽL = 0162. = 


THEOREM 3.10 


Suppose that a population of size N consists of r units with the attribute and 
N — r without. If a sample of size n it taken, without replacement, and Y is the 
number of items with the attribute in the sample, P(Y = yo) = p(yo) can be 
found by using ће А (or S-Plus) command dhyper (уо, r, N-r, n). The command 
dhyper(5,5,15,10) yields the value for p(5) in Example 3.16. Alternatively, 
P(Y < yo) is found by using the R (or S-Plus) command phyper (yo, xz, N-r,n). 

The mean and variance of a random variable with a hypergeometric distribution 
can be derived directly from Definitions 3.4 and 3.5. However, deriving closed form 
expressions for the resulting summations is somewhat tedious. In Chapter 5 we will 
develop methods that permit a much simpler derivation of the results presented in the 
following theorem. 


If Y is a random variable with a hypergeometric distribution, 
EY) nr d А VQ) (=) N-r N-n 
= = — an бе = = л 5 
Е N м/м у 


Although the mean and the variance of the hypergeometric random variable seem to 
be rather complicated, they bear a striking resemblance to the mean and variance of 
a binomial random variable. Indeed, if we define р = 5 and q = 1 — р = N NO we 
can re-express the mean and variance of the hypergeometric as u = np and 


2 N-—n 
o =n . 
РЧ NLI 


N-n 

N-1 
in V(Y) as an adjustment that is appropriate when и is large relative to N. For fixed 
n,as N — oo, 


You can view the factor 


EXAMPLE 3.17 


An industrial product is shipped in lots of 20. Testing to determine whether an item 
is defective is costly, and hence the manufacturer samples his production rather than 
using a 10046 inspection plan. A sampling plan, constructed to minimize the number 
of defectives shipped to customers, calls for sampling five items from each lot and 
rejecting the lot if more than one defective is observed. (If the lot is rejected, each 
item in it is later tested.) If a lot contains four defectives, what is the probability that 
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Solution 


it will be rejected? What is the expected number of defectives in the sample of size 5? 
What is the variance of the number of defectives in the sample of size 5? 


Let Y equal the number of defectives in the sample. Then № = 20, r = 4, and n = 5. 
The lot will be rejected if Y — 2,3, or 4. Then 
P (rejecting the lot) = P(Y > 2) = р(2) + p(3) + p(4) 
—1-p(0)- pd) 
_, 009. 005 
O 6 


= | — .2817 — .4696 = .2487. 


The mean and variance of the number of defectives in the sample of size 5 are 


MOON j aft fedes 
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3.103 


Example 3.17 involves sampling a lot of N industrial products, of which r are 
defective. The random variable of interest is Y, the number of defectives in a sample 
of size n. As noted in the beginning of this section, Y possesses an approximately 
binomial distribution when N is large and л is relatively small. Consequently, we 
would expect the probabilities assigned to values of Y by the hypergeometric distri- 
bution to approach those assigned by the binomial distribution as N becomes large 
and r/N, the fraction defective in the population, is held constant and equal to p. 
You can verify this expectation by using limit theorems encountered in your calculus 


courses to show that 
r\(N-r 
im OED) «(pra pes 


r — 

Po 
(The proof of this result is omitted.) Hence, for a fixed fraction defective p = r/N, the 
hypergeometric probability function converges to the binomial probability function 


as N becomes large. 


where 


Exercises 


An urn contains ten marbles, of which five are green, two are blue, and three are red. Three 
marbles are to be drawn from the urn, one at a time without replacement. What is the probability 
that all three marbles drawn will be green? 


A warehouse contains ten printing machines, four of which are defective. A company selects 
five of the machines at random, thinking all are in working condition. What is the probability 
that all five of the machines are nondefective? 
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Twenty identical looking packets of white power are such that 15 contain cocaine and 5 do 
not. Four packets were randomly selected, and the contents were tested and found to contain 
cocaine. Two additional packets were selected from the remainder and sold by undercover 
police officers to a single buyer. What is the probability that the 6 packets randomly selected 
are such that the first 4 all contain cocaine and the 2 sold to the buyer do not? 


In southern California, a growing number of individuals pursuing teaching credentials are 
choosing paid internships over traditional student teaching programs. A group of eight candi- 
dates for three local teaching positions consisted of five who had enrolled in paid internships 
and three who enrolled in traditional student teaching programs. All eight candidates appear 
to be equally qualified, so three are randomly selected to fill the open positions. Let Y be the 
number of internship trained candidates who are hired. 


a Does Y have a binomial or hypergeometric distribution? Why? 
b Find the probability that two or more internship trained candidates are hired. 


c What are the mean and standard deviation of Y? 


Refer to Exercise 3.103. The company repairs the defective ones at a cost of $50 each. Find 
the mean and variance of the total repair cost. 


Seed are often treated with fungicides to protect them in poor draining, wet environments. 
A small-scale trial, involving five treated and five untreated seeds, was conducted prior to a 
large-scale experiment to explore how much fungicide to apply. The seeds were planted in wet 
soil, and the number of emerging plants were counted. If the solution was not effective and 
four plants actually sprouted, what is the probability that 


a all four plants emerged from treated seeds? 
b three or fewer emerged from treated seeds? 


c atleast one emerged from untreated seeds? 


A shipment of 20 cameras includes 3 that are defective. What is the minimum number of 
cameras that must be selected if we require that P(at least 1 defective) > .8? 


A group of six software packages available to solve a linear programming problem has been 
ranked from 1 to 6 (best to worst). An engineering firm, unaware of the rankings, randomly se- 
lected and then purchased two of the packages. Let Y denote the number of packages purchased 
by the firm that are ranked 3, 4, 5, or 6. Give the probability distribution for Y. 


A corporation is sampling without replacement for n — 3 firms to determine the one from 
which to purchase certain supplies. The sample is to be selected from a pool of six firms, of 
which four are local and two are not local. Let Y denote the number of nonlocal firms among 
the three selected. 


a Р(Ү = 1). 
b P(Y>1). 
c P(Y <1). 


Specifications call for a thermistor to test out at between 9000 and 10,000 ohms at 25° Celcius. 
Ten thermistors are available, and three of these are to be selected for use. Let Y denote the 
number among the three that do not conform to specifications. Find the probability distributions 
for Y (in tabular form) under the following conditions: 


a Two thermistors do not conform to specifications among the ten that are available. 


b Four thermistors do not conform to specifications among the ten that are available. 
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Used photocopy machines are returned to the supplier, cleaned, and then sent back out on lease 
agreements. Major repairs are not made, however, and as a result, some customers receive 
malfunctioning machines. Among eight used photocopiers available today, three are malfunc- 
tioning. A customer wants to lease four machines immediately. To meet the customer’s deadline, 
four of the eight machines are randomly selected and, without further checking, shipped to the 
customer. What is the probability that the customer receives 


a по malfunctioning machines? 


b atleast one malfunctioning machine? 


A jury of 6 persons was selected from a group of 20 potential jurors, of whom 8 were African 
American and 12 were white. The jury was supposedly randomly selected, but it contained 
only 1 African American member. Do you have any reason to doubt the randomness of the 
selection? 


Refer to Exercise 3.113. If the selection process were really random, what would be the mean 
and variance of the number of African American members selected for the jury? 


Suppose that a radio contains six transistors, two of which are defective. Three transistors 
are selected at random, removed from the radio, and inspected. Let Y equal the number of 
defectives observed, where Y = 0, 1, or 2. Find the probability distribution for Y. Express 
your results graphically as a probability histogram. 


Simulate the experiment described in Exercise 3.115 by marking six marbles or coins so that 
two represent defectives and four represent nondefectives. Place the marbles in a hat, mix, 
draw three, and record Y, the number of defectives observed. Replace the marbles and repeat 
the process until п = 100 observations of Y have been recorded. Construct a relative frequency 
histogram for this sample and compare it with the population probability distribution (Exercise 
3.115). 


In an assembly-line production of industrial robots, gearbox assemblies can be installed in 
one minute each if holes have been properly drilled in the boxes and in ten minutes if 
the holes must be redrilled. Twenty gearboxes are in stock, 2 with improperly drilled holes. 
Five gearboxes must be selected from the 20 that are available for installation in the next five 
robots. 


a Find the probability that all 5 gearboxes will fit properly. 


b Find the mean, variance, and standard deviation of the time it takes to install these 
5 gearboxes. 


Five cards are dealt at random and without replacement from a standard deck of 52 cards. 
What is the probability that the hand contains all 4 aces if it is known that it contains at least 
3 aces? 


Cards are dealt at random and without replacement from a standard 52 card deck. What is the 
probability that the second king is dealt on the fifth card? 


The sizes of animal populations are often estimated by using a capture-tag-recapture method. 
In this method k animals are captured, tagged, and then released into the population. Some time 
later n animals are captured, and Y, the number of tagged animals among the л, is noted. The 
probabilities associated with Y are a function of N, the number of animals in the population, 
so the observed value of Y contains information on this unknown N. Suppose that k — 4 
animals are tagged and then released. A sample of n — 3 animals is then selected at random 
from the same population. Find P(Y — 1) as a function of N. What value of N will maximize 
Р(Ү = 1)? 
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The Poisson Probability Distribution 


Suppose that we want to find the probability distribution of the number of automobile 
accidents at a particular intersection during a time period of one week. At first glance 
this random variable, the number of accidents, may not seem even remotely related 
to a binomial random variable, but we will see that an interesting relationship exists. 

Think of the time period, one week in this example, as being split up into n 
subintervals, each of which is so small that at most one accident could occur in it 
with probability different from zero. Denoting the probability of an accident in any 
subinterval by p, we have, for all practical purposes, 


P (no accidents occur in a subinterval) = 1 — p, 
P (one accident occurs in a subinterval) = p, 
P (more than one accident occurs in a subinterval) = 0. 


Then the total number of accidents in the week is just the total number of subin- 
tervals that contain one accident. If the occurrence of accidents can be regarded as 
independent from interval to interval, the total number of accidents has a binomial 
distribution. 

Although there is no unique way to choose the subintervals, and we therefore 
know neither п nor p, it seems reasonable that as we divide the week into a greater 
number п of subintervals, the probability p of one accident in one of these shorter 
subintervals will decrease. Letting A = np and taking the limit of the binomial 
probability p(y) — (9) р (1 — р)" > as n — co, ме have 


=f Н _ 1 x y À n—y 
lim (")ora pts im ы. = уе ( ) (1 ) 
n—>oo\y п->со y: п п 


= s =) n(n— 1)---(п—у-+ 1) ( а). 


п п? 


АУ. AN" i 1 
= — lim [1 1 1 
y! noo п п п 
2 —1 
x(1-2) xox (1-2 
n n 


x n 
lim (1 = =) = е^ 
п->со п 


and all other terms to the right of the limit have a limit of 1, we obtain 


Noting that 


j 
pO) = т 

(Note: е = 2.718....) Random variables possessing this distribution are said to have 
a Poisson distribution. Hence, Y, the number of accidents per week, has the Poisson 
distribution just derived. 


132  Chapter3 Discrete Random Variables and Their Probability Distributions 


DEFINITION 3.11 


Because the binomial probability function converges to the Poisson, the Poisson 
probabilities can be used to approximate their binomial counterparts for large п, 
small p, and à = np less than, roughly, 7. Exercise 3.134 requires you to calculate 
corresponding binomial and Poisson probabilities and will demonstrate the adequacy 
of the approximation. 

The Poisson probability distribution often provides a good model for the proba- 
bility distribution of the number Y of rare events that occur in space, time, volume, 
or any other dimension, where А is the average value of Y. As we have noted, it 
provides a good model for the probability distribution of the number Y of automobile 
accidents, industrial accidents, or other types of accidents in a given unit of time. 
Other examples of random variables with approximate Poisson distributions are the 
number of telephone calls handled by a switchboard in a time interval, the number 
of radioactive particles that decay in a particular time period, the number of errors a 
typist makes in typing a page, and the number of automobiles using a freeway access 
ramp in a ten-minute interval. 


A random variable Y is said to have a Poisson probability distribution if and 
only if 
3» 
pyy=—e*, y-201,2,., X0 
y! 


As we will see in Theorem 3.11, the parameter à that appears in the formula for 
the Poisson distribution is actually the mean of the distribution. 


EXAMPLE 3.18 


Solution 


Show that the probabilities assigned by the Poisson probability distribution satisfy 
the requirements that 0 < p(y) < 1 forall y and >), p(y) = 1. 


Because A > 0, it is obvious that p(y) > 0 for y = 0, 1, 2,..., and that p(y) = 0 
otherwise. Further, 


оо yy 


оо оо ay 3 
Уро = У е^ =e — еек = 1 
y 


! 
y=0 y=0 Y y=0 Y: 


because the infinite sum 355 А? /у\ is a series expansion of е^. Sums of special 


series are given in Appendix А1.11. E 


EXAMPLE 3.19 


Suppose that a random system of police patrol is devised so that a patrol officer 
may visit a given beat location Y = 0, 1, 2, 3,... times per half-hour period, with 
each location being visited an average of once per time period. Assume that Y pos- 
sesses, approximately, a Poisson probability distribution. Calculate the probability 
that the patrol officer will miss a given location during a half-hour period. What is 
the probability that it will be visited once? Twice? At least once? 


Solution 
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For this example the time period is a half-hour, and the mean number of visits per 
half-hour interval is А = 1. Then 


_ (1)%e7! _ e 
ПИСТЕ ЕТ 


The event that a given location is missed in a half-hour period corresponds to (Y = 0), 
and 


pO) y =0, 1, 2,.... 


-1 


P(Y = 0) = p0) = or = е! = 368. 


Similarly, 
and 


The probability that the location is visited at least once is the event (Y > 1). Then 


PY >1)=)) pO) =1- pO) = 1-7! = .632. B 


у=1 


If Y has a Poisson distribution with mean A, P(Y = yo) = p(yo) can be found by 
using the А (or S-Plus) command dpois (yo, А). If we wanted to use R to obtain 
p(2) in Example 3.19, we use the command dpois (2, 1). Alternatively, P(Y < yo) 
is found by using the R (or S-Plus) command ppois (уо, A). 


EXAMPLE 3.20 


Solution 


A certain type of tree has seedlings randomly dispersed in a large area, with the mean 
density of seedlings being approximately five per square yard. If a forester randomly 
locates ten 1-square-yard sampling regions in the area, find the probability that none 
of the regions will contain seedlings. 


If the seedlings really are randomly dispersed, the number of seedlings per region, 
Y, can be modeled as a Poisson random variable with A = 5. (The average density is 
five per square yard.) Thus, 


А0е—^ 


0! 

The probability that У = 0 on ten independently selected regions is (е because 
the probability of the intersection of independent events is equal to the product of the 
respective probabilities. The resulting probability is extremely small. Thus, if this 
event actually occurred, we would seriously question the assumption of randomness, 
the stated average density of seedlings, or both. 


= е7° = .006738. 


P(Y = 0) = p0) = 


=O 
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For your convenience, we provide in Table 3, Appendix 3, the partial sums 
pee p(y) for the Poisson probability distribution for many values of à between 
.02 and 25. This table is laid out similarly to the table of partial sums for the bino- 
mial distribution, Table 1, Appendix 3. The following example illustrates the use of 
Table 3 and demonstrates that the Poisson probability distribution can approximate 
the binomial probability distribution. 


EXAMPLE 3.21 


Solution 


Suppose that Y possesses a binomial distribution with n — 20 and p — .1. Find the 
exact value of P(Y < 3) using the table of binomial probabilities, Table 1, Appendix 3. 
Use Table 3, Appendix 3, to approximate this probability, using a corresponding 
probability given by the Poisson distribution. Compare the exact and approximate 
values for P(Y < 3). 


According to Table 1, Appendix 3, the exact (accurate to three decimal places) value 
of P(Y < 3) = .867. If W is a Poisson-distributed random variable with A = np = 
20(.1) = 2, previous discussions indicate that P(Y < 3) is approximately equal 
to P(W < 3). Table 3, Appendix 3, [or ће А command ppois (3, 2)], gives 
P(W x 3) = .857. Thus, you can see that the Poisson approximation is quite good, 
yielding a value that differs from the exact value by only .01. 


THEOREM 3.11 


Proof 


In our derivation of the mean and variance of a random variable with the Poisson 
distribution, we again use the fundamental property that  , p(y) = 1 for any discrete 
probability distribution. 


If Y is a random variable possessing a Poisson distribution with parameter À, 
then 


L-—E(Y)2X and о? = V(Y)- X. 


By definition, 
oo Ае —A 


EO) — 9 2G) — ET 
M 


= 


Notice that the first term in this sum is equal to 0 (when у = 0), and, hence, 


aS В Uem) 


As it stands, this quantity is not equal to the sum of the values of a probability 
function p(y) over all values of y, but we can change it to the proper form 
by factoring à out of the expression and letting z = y — 1. Then the limits of 
summation become z = 0 (when у = 1) and z = oo (when у = оо), and 


Ne —A 


oo AS 1 e^ Me 


OS — = 23 a 


3.8 The Poisson Probability Distribution 135 


Notice that p(z) = A*e~*/z! is the probability function for a Poisson random 
variable, and бй p(z) = 1. Therefore, E(Y) = A. Thus, the mean of a 
Poisson random variable is the single parameter A that appears in the expression 
for the Poisson probability function. 

We leave the derivation of the variance as Exercise 3.138. 


A common way to encounter a random variable with a Poisson distribution is 
through a model called a Poisson process. A Poisson process is an appropriate model 
for situations as described at the beginning of this section. If we observe a Poisson 
process and A is the mean number of occurrences per unit (length, area, etc.), then 
Y = the number of occurrences in a units has a Poisson distribution with mean aA. A 
key assumption in the development of the theory of Poisson process is independence 
of the numbers of occurrences in disjoint intervals (areas, etc.). See Hogg, Craig, and 
McKean (2005) for a theoretical development of the Poisson process. 


EXAMPLE 3.22 


Solution 


Industrial accidents occur according to a Poisson process with an average of three 
accidents per month. During the last two months, ten accidents occurred. Does this 
number seem highly improbable if the mean number of accidents per month, ух, is 
still equal to 3? Does it indicate an increase in the mean number of accidents per 
month? 


The number of accidents in two months, Y , has a Poisson probability distribution with 
mean A* — 2(3) — 6. The probability that Y is as large as 10 is 


oo yo-6 
РҮ > 10) = У) ce 


! 
y=10 У: 


The tedious calculation required to find P(Y > 10) can be avoided by using Table 3, 
Appendix 3, software such as R [ppois (9,6) yields P(Y < 9)]; or the empirical 
rule. From Theorem 3.11, 


и=)^=6, C=N=6, oc =V6=245. 


The empirical rule tells us that we should expect Y to take values in the interval 
u + 20 with a high probability. 

Notice that и + 20 = 6 + (2)(2.45) = 10.90. The observed number of acci- 
dents, Y = 10, does not lie more than 2o from p, but it is close to the boundary. 
Thus, the observed result is not highly improbable, but it may be sufficiently impro- 
bable to warrant an investigation. See Exercise 3.210 for the exact probability 
P(|Y — Х| < 20). О 


136  Chapter3 Discrete Random Variables and Their Probability Distributions 


3.121 


3.122 


3.123 
3.124 


3.125 


3.126 


3.127 


3.128 


*3.129 


3.130 


3.131 


3.132 


Exercises 


Let Y denote a random variable that has a Poisson distribution with mean A = 2. Find 
P(Y — 4). 

P(Y > 4). 

P(Y « 4). 

P(Y > 4|Y > 2). 


оо CE 


Customers arrive at a checkout counter in a department store according to a Poisson distribution 
at an average of seven per hour. During a given hour, what are the probabilities that 


a no more than three customers arrive? 
b atleast two customers arrive? 


C exactly five customers arrive? 
The random variable Y has a Poisson distribution and is such that p(0) = p(1). What is p(2)? 


Approximately 4% of silicon wafers produced by a manufacturer have fewer than two large 
flaws. If Y, the number of flaws per wafer, has a Poisson distribution, what proportion of the 
wafers have more than five large flaws? [Hint: Use Table 3, Appendix 3.] 


Refer to Exercise 3.122. If it takes approximately ten minutes to serve each customer, find 
the mean and variance of the total service time for customers arriving during a 1-hour period. 
(Assume that a sufficient number of servers are available so that no customer must wait for 
service.) Is it likely that the total service time will exceed 2.5 hours? 


Refer to Exercise 3.122. Assume that arrivals occur according to a Poisson process with an 
average of seven per hour. What is the probability that exactly two customers arrive in the 
two-hour period of time between 


a 2:00 P.M. and 4:00 P.M. (one continuous two-hour period)? 


b 1:00 P.M. and 2:00 P.M. or between 3:00 Р.М. and 4:00 P.M. (two separate one-hour periods 
that total two hours)? 


The number of typing errors made by a typist has a Poisson distribution with an average of 
four errors per page. If more than four errors appear on a given page, the typist must retype the 
whole page. What is the probability that a randomly selected page does not need to be retyped? 


Cars arrive at a toll both according to a Poisson process with mean 80 cars per hour. If the 
attendant makes a one-minute phone call, what is the probability that at least 1 car arrives 
during the call? 


Refer to Exercise 3.128. How long can the attendant’s phone call last if the probability is at 
least .4 that no cars arrive during the call? 


A parking lot has two entrances. Cars arrive at entrance I according to a Poisson distribution at 
an average of three per hour and at entrance II according to a Poisson distribution at an average 
of four per hour. What is the probability that a total of three cars will arrive at the parking lot in 
a given hour? (Assume that the numbers of cars arriving at the two entrances are independent.) 


The number of knots in a particular type of wood has a Poisson distribution with an average of 
1.5 knots in 10 cubic feet of the wood. Find the probability that a 10-cubic-foot block of the 
wood has at most 1 knot. 


The mean number of automobiles entering a mountain tunnel per two-minute period is one. An 
excessive number of cars entering the tunnel during a brief period of time produces a hazardous 


3.133 


3.134 


3.135 


3.136 


3.137 


3.138 


3.139 


*3.140 


3.141 


*3.142 


Exercises 137 


situation. Find the probability that the number of autos entering the tunnel during a two-minute 
period exceeds three. Does the Poisson model seem reasonable for this problem? 


Assume that the tunnel in Exercise 3.132 is observed during ten two-minute intervals, thus 
giving ten independent observations Y;, Y2,..., Yio, on the Poisson random variable. Find the 
probability that Y > 3 during at least one of the ten two-minute intervals. 


Consider a binomial experiment for n = 20, p = .05. Use Table 1, Appendix 3, to calculate 
the binomial probabilities for Y = 0, 1, 2, 3, and 4. Calculate the same probabilities by using 
the Poisson approximation with А = np. Compare. 


A salesperson has found that the probability of a sale on a single contact is approximately .03. 
If the salesperson contacts 100 prospects, what is the approximate probability of making at 
least one sale? 


Increased research and discussion have focused on the number of illnesses involving the organ- 
ism Escherichia coli (10257:H7), which causes a breakdown of red blood cells and intestinal 
hemorrhages in its victims (http://www.hsus.org/ace/11831, March 24, 2004). Sporadic out- 
breaks of E.coli have occurred in Colorado at a rate of approximately 2.4 per 100,000 for a 
period of two years. 


a If this rate has not changed and if 100,000 cases from Colorado are reviewed for this year, 
what is the probability that at least 5 cases of E.coli will be observed? 

b If 100,000 cases from Colorado are reviewed for this year and the number of E.coli cases 
exceeded 5, would you suspect that the state's mean E.coli rate has changed? Explain. 


The probability that a mouse inoculated with a serum will contract a certain disease is .2. 
Using the Poisson approximation, find the probability that at most 3 of 30 inoculated mice will 
contract the disease. 


Let Y have a Poisson distribution with mean A. Find E[Y(Y — 1)] and then use this to show 
that V(Y) =A. 


In the daily production of a certain kind of rope, the number of defects per foot Y is assumed 
to have a Poisson distribution with mean à = 2. The profit per foot when the rope is sold is 
given by X, where X — 50 — 2Y — Y?. Find the expected profit per foot. 


A store owner has overstocked a certain item and decides to use the following promotion to 
decrease the supply. The item has a marked price of $100. For each customer purchasing the 
item during a particular day, the owner will reduce the price by a factor of one-half. Thus, 
the first customer will pay $50 for the item, the second will pay $25, and so on. Suppose that 
the number of customers who purchase the item during the day has a Poisson distribution with 
mean 2. Find the expected cost of the item at the end of the day. [Hint: The cost at the end of 
the day is 100(1/2)”, where Y is the number of customers who have purchased the item.] 


A food manufacturer uses an extruder (a machine that produces bite-size cookies and snack 
food) that yields revenue for the firm at a rate of $200 per hour when in operation. However, the 
extruder breaks down an average of two times every day it operates. If Y denotes the number 
of breakdowns per day, the daily revenue generated by the machine is R = 1600 — 50Y?. Find 
the expected daily revenue for the extruder. 


Let p(y) denote the probability function associated with a Poisson random variable with 
mean i. 


: : m . A 
a Show that the ratio of successive probabilities satisfies = —; foy = l; 2,.... 
y 


b For which values of y is p(y) > p(y — 1)? 
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3.143 


3.144 


3.9 


DEFINITION 3.12 


DEFINITION 3.13 


с Notice that the result in part (a) implies that Poisson probabilities increase for awhile as у 
increases and decrease thereafter. Show that p(y) maximized when y — the greatest integer 
less than or equal to A. 


Refer to Exercise 3.142 (c). If the number of phone calls to the fire department, Y, in a day has 
a Poisson distribution with mean 5.3, what is the most likely number of phone calls to the fire 
department on any day? 


Refer to Exercises 3.142 and 3.143. If the number of phone calls to the fire department, Y, in 
a day has a Poisson distribution with mean 6, show that p(5) — p(6) so that 5 and 6 are the 
two most likely values for Y. 


Moments and Moment-Generating 
Functions 


The parameters u and о are meaningful numerical descriptive measures that locate 
the center and describe the spread associated with the values of a random variable 
Y. They do not, however, provide a unique characterization of the distribution of У. 
Many different distributions possess the same means and standard deviations. We 
now consider a set of numerical descriptive measures that (at least under certain 
conditions) uniquely determine p(y). 


The kth moment of a random variable Y taken about the origin is defined to be 
E (Y^) and is denoted by fils 


Notice in particular that the first moment about the origin, is E(Y) = и; = u and 
that и» = E(Y?) is employed in Theorem 3.6 for finding o°. 
Another useful moment of a random variable is one taken about its mean. 


The kth moment of a random variable Y taken about its mean, or the kth central 
moment of Y , is defined to be E[(Y — ш)“] and is denoted by uk. 


In particular, о? = n2. 

Let us concentrate on moments Hj about the origin where К = 1, 2, 3,.... 
Suppose that two random variables Y and Z possess finite moments with uiy = 
Miz, Moy = Mog... Шу = Шз, Where j can assume any integer value. That is, 
the two random variables possess identical corresponding moments about the origin. 
Under some fairly general conditions, it can be shown that Y and Z have identical 
probability distributions. Thus, a major use of moments is to approximate the prob- 
ability distribution of a random variable (usually an estimator or a decision maker). 
Consequently, the moments me where К = 1, 2, 3,..., are primarily of theoretical 
value for k > 3. 

Yet another interesting expectation is the moment-generating function for arandom 
variable, which, figuratively speaking, packages all the moments for a random variable 
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into one simple expression. We will first define the moment-generating function and 
then explain how it works. 


DEFINITION 3.14 The moment-generating function m(t) for a random variable Y is defined to be 
m(t) = Е(е'!). We say that a moment-generating function for Y exists if there 
exists a positive constant b such that т (f) is finite for |t| < b. 


Why is E(e!”) called the moment-generating function for Y ? From a series expan- 
sion for e”, we have 


(ty)? | (y? | (ty)4 


Ey dare 
=1+ty+ T RE 31 + 4l + А 
Then, assuming that m is finite for k — 1, 2, 3,..., we have 
(ty)? yr 
E(e’) = ty = 1+t 3E - 
(е) nr p(y) >| fupe ap Pug 


2 3 
= Yo) 13 уро) + у РО) + | Ly Po) + 


1? Bp 


ат ++ 


This argument involves an dba of summations, which is justifiable if m(t) 
exists. Thus, E(e’”) is a function of all the moments m about the origin, for k — 
1, 2, 3,.... In particular, ш, is the coefficient of t*/k! in the series expansion of 
m(t). 

The moment-generating function possesses two important applications. First, if 
we can find E (e/"), we can find any of the moments for Y. 


THEOREM 3.12 If m(t) exists, then for any positive integer К, 
d'm(t) d) 
= 0 = 
E 1. т (0) = и 


In other words, if you find the kth derivative of m(t) with respect to t and 
then set = 0, the result will be ш. 


Proof d‘m(t) dts. or m (t), is the kth derivative of m(t) with respect to т. Because 
72 13 
m(t) = E(e'*) = lem + 5+ a 
it follows that 
То 2n j 
C) = [шч = эт "EE ar^ F 


2 3 
m?) = и, + zu ор 
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and, in general, 


mM) = Ht Shas + ras 
Setting f = 0 in each of the above dece wé obtain 
mOO)= Hy, m? = u, 
and, in general, 
AOS (у. 


These operations involve interchanging derivatives and infinite sums, which 
can be justified if m(t) exists. 


EXAMPLE 3.23 


Solution 


Find the moment-generating function m(t) for a Poisson distributed random variable 
with mean A. 


e^ 


oo oo X 
m= Ee x dup = 


у=0 у=0 


! ! 
xx 3 =й A 


» Qe ye е (бе)? 
= ——————=е 


To complete the summation, consult Appendix А1.11 to find the Taylor series 
expansion 


= Ae! У 1 
by ( ) = e 


у=0 7 ! 
or employ the method of Theorem 3.11. Thus, multiply and divide by е^. Then 
оо tyy „Ае! 
LA det (Ле )?е 
m(t)=e е — c; 
(r) 2 . 


The quantity to the right of the summation sign is ће probability function for a Poisson 
random variable with mean Ae’. Hence, 


> pO)=1 and то = ee" (1) = eD, - 


y 


The calculations in Example 3.23 are no more difficult than those in Theorem 3.11, 
where only the expected value for a Poisson random variable Y was calculated. Direct 
evaluation of the variance of Y through the use of Theorem 3.6 required that E (Y?) be 
found by summing another series [actually, we obtained E(Y 2) from E[Y (Y — 1)] in 
Exercise 3.138]. Example 3.24 illustrates the use of the moment-generating function 
of the Poisson random variable to calculate its mean and variance. 


3.9 Moments and Moment-Generating Functions 141 


EXAMPLE 3.24 


Solution 


Use the moment-generating function of Example 3.23 and Theorem 3.12 to find the 
mean, и, and variance, о?2, for the Poisson random variable. 


According to Theorem 3.12, u = w, = m? (0) апа и» = m® (0). Taking the first 
and second derivatives of m(t), we obtain 


m (t) = d [e> =D] eg «-D . Ael, 
dt 
d? t d { 
(2 = A(e' 1) Ае! —1) t 
m (t) = e = — [е „же 
© ap! l a | 


(Ае, (ret)? + емеё-0.деї, 
Then, because 


u = m0) = er ae] E X, 
t= 


ш, = m® (0) = je" (hey HeD. zl = А2 +А, 
7 1=0 


Theorem 3.6 tells us that o? = E(Y?) — u? = ш — u? = А +1 — (А)? =A. Notice 
how easily we obtained 14, from m(t). El 


The second (but primary) application of a moment-generating function is to prove 
that a random variable possesses a particular probability distribution p(y). If m(t) 
exists for a probability distribution p(y), it is unique. Also, if the moment-generating 
functions for two random variables Y and Z are equal (for all |t| < b for some 
b > 0), then Y and Z must have the same probability distribution. It follows that, if 
we can recognize the moment-generating function of a random variable Y to be one 
associated with a specific distribution, then Y must have that distribution. 

In summary, a moment-generating function is a mathematical expression that 
sometimes (but not always) provides an easy way to find moments associated with 
random variables. More important, it can be used to establish the equivalence of two 
probability distributions. 


EXAMPLE 3.25 


Solution 


Suppose that Y is a random variable with moment-generating function my(t) = 
e??€ -0, What is the distribution of Y? 


In Example 3.23, we showed that the moment-generating function of a Poisson dis- 
tributed random variable with mean А is m(t) = e^ * -D. Note that the moment- 
generating function of Y is exactly equal to the moment-generating function of a 
Poisson distributed random variable with A = 3.2. Because moment-generating func- 
tions are unique, Y must have a Poisson distribution with mean 3.2. NH 
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Exercises 


If Y has a binomial distribution with n trials and probability of success p, show that the 
moment-generating function for Y is 


m(t) = (ре + 4)", where g = 1 — р. 
Differentiate the moment-generating function in Exercise 3.145 to find E(Y) and Е(Ү?). Then 
find V(Y). 


If Y has a geometric distribution with probability of success p, show that the moment-generating 
function for Y is 
pe 
1 — qe! 
Differentiate the moment-generating function in Exercise 3.147 to find E(Y) and E(Y 2), Then 
find V(Y). 


m(t) — ; where q = 1 — p. 


Refer to Exercise 3.145. Use the uniqueness of moment-generating functions to give the dis- 
tribution of a random variable with moment-generating function m(t) = (.бе' + .4)?. 


Refer to Exercise 3.147. Use the uniqueness of moment-generating functions to give the dis- 
Зе! 

1— Jet’ 

Refer to Exercise 3.145. If Y has moment-generating function m(t) = (.7е' + .3)'°, what is 

P(Y <5)? 


tribution of a random variable with moment-generating function m(t) = 


6(e! —1 


Refer to Example 3.23. If Y has moment-generating function m(t) = e ^, what is 


P(|Y — ш € 20)? 
Find the distributions of the random variables that have each of the following moment- 


generating functions: 


a т@ = [(@1/3)е' + (2/3)[. 


t 


b m(t)= JEFA 

c m(t)= ex” 10), 

Refer to Exercise 3.153. By inspection, give the mean and variance of the random variables 

associated with the moment-generating functions given in parts (a), (b), and (c). 

Let m(t) = (1/6)e! + (2/6)e? + (3/6)e?'. Find the following: 

a E(Y) 

b V(Y) 

c The distribution of Y 

Suppose that Y is a random variable with moment-generating function т (t). 

a Whatis m(0)? 

b IfW —3Y,show that the moment-generating function of W is m(3t). 

c If X = Y — 2, show that the moment-generating function of X is e~m(t). 

Refer to Exercise 3.156. 

a If W = 3Y, use the moment-generating function of W to show that E(W) = 3E(Y) and 
V(W) =9V(Y). 


b If X — Y —2, use the moment-generating function of X to show that E(X) — E(Y) — 2 
and У(Х) = V(Y). 
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If Y is a random variable with moment-generating function m(t) and if W is given by 
W = aY + b, show that the moment-generating function of W is e'^m(at). 


Use the result in Exercise 3.158 to prove that, if W = aY + b, then E(W) = aE(Y) + b and 
V(W) = aà?V(Y). 


Suppose that Y is a binomial random variable based on n trials with success probability p and 

let Y* 2 n — Y. 

a Use the result in Exercise 3.159 to show that E(Y*) — nq and V(Y*) — npq, where 
q=1-p. 

b Use the result in Exercise 3.158 to show that the moment-generating function of Y* is 
m* (t) = (qe! + р)", where q = 1 — p. 

с Based on your answer to part (b), what is the distribution of Y*? 

d IfY is interpreted as the number of successes in a sample of size n, what is the interpretation 
of Y*? 


e Basedon your answer in part (d), why are the answers to parts (a), (b), and (c) “obvious”? 


Refer to Exercises 3.147 and 3.158. If Y has a geometric distribution with success probability р, 
P 


consider Y* = Y — 1. Show that the moment-generating function of Y* is m*(t) = 1 т, 
— qe 


whereq — 1 — p. 


Let r(t) = In[m(t)] and r™ (0) denote the kth derivative of r (t) evaluated for t = 0. Show that 
r0 (0) = ш = u and r? (0) = и — (ш)? = o? [Hint: m(0) = 1.] 


Use the results of Exercise 3.162 to find the mean and variance of a Poisson random variable 
with m(t) — е5 -D Notice that r (t) is easier to differentiate than m(t) in this case. 


Probability-Generating 
Functions (Optional) 


An important class of discrete random variables is one in which Y represents a count 
and consequently takes integer values: Y = 0, 1, 2, 3, .... The binomial, geometric, 
hypergeometric, and Poisson random variables all fall in this class. The following 
examples give practical situations that result in integer-valued random variables. One, 
involving the theory of queues (waiting lines), is concerned with the number of persons 
(or objects) awaiting service at a particular point in time. Knowledge of the behavior of 
this random variable is important in designing manufacturing plants where production 
consists of a sequence of operations, each taking a different length of time to complete. 
An insufficient number of service stations for a particular production operation can 
result in a bottleneck, the formation of a queue of products waiting to be serviced, 
and a resulting slowdown in the manufacturing operation. Queuing theory is also 
important in determining the number of checkout counters needed for a supermarket 
and in designing hospitals and clinics. 

Integer-valued random variables are also important in studies of population growth. 
For example, epidemiologists are interested in the growth of bacterial populations and 
the growth of the number of persons afflicted by a particular disease. The numbers of 
elements in each of these populations are integer-valued random variables. 
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A mathematical device useful in finding the probability distributions and other 
properties of integer-valued random variables is the probability-generating function. 


DEFINITION 3.15 Let Y be an integer-valued random variable for which P(Y = i) = pj, where 
i = 0, 1, 2,.... The probability-generating function P (1) for Y is defined to 
be 


oo 
P(t) = E(t’) = pot pit + pot? = Y pit! 
i=0 


for all values of t such that P (t) is finite. 


The reason for calling P(t) a probability-generating function is clear when we 
compare P (t) with the moment-generating function m (t). In particular, the coefficient 
of t! in P(t) is the probability p;. Correspondingly, the coefficient of t/ for m(t) is a 
constant times the ith moment и}. If we know P(t) and can expand it into a series, 
we can determine p(y) as the coefficient of t”. 

Repeated differentiation of P(t) yields factorial moments for the random 
variable У. 


DEFINITION 3.16 The kth factorial moment for a random variable Y is defined to be 
РОО = Dyno See DI 


where k is a positive integer. 


Notice that иг = E(Y) = и. The second factorial moment, шро = E[Y (Y — 1)], 
was useful in finding the variance for binomial, geometric, and Poisson random 
variables in Theorem 3.7, Exercise 3.85, and Exercise 3.138, respectively. 


THEOREM 3.13 If P(t) is the probability-generating function for an integer-valued random 
variable, Y, then the kth factorial moment of Y is given by 
d* P(t) 
dt* 


| = P) = щы. 
t=1 


Proof Because 
P(t) = po + pit + pot pst? + pat Азоо; 
it follows that 


PO) = == 


= pi t 2p Зри 4p oes 


PPO _ 
Wo 


POG) = (2)(1)р» + (3)(2)рзї + (4)(3) pat? +, 
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and, in general, 
d* P(t) 


po (f) = dtk 


=} y0- D-2): (у – k+ DPMP. 
y=k 


Setting t = 1 in each of these derivatives, we obtain 

PO (1) = pi + 2р2 + 3p3 + Apa = шц EQ), 

POO = (2)(1)р + Op + (4)(3)ра + --- = р = Е[Ү(У — 1)], 
and, in general, 


Р®(1у =) `у(у— DY – 2)-::(у –Е+ DpQ) 
y=k 


= EY (Y — DY — 2) --- (Y — k+ 1)] = ug: 


EXAMPLE 3.26 Find the probability-generating function for a geometric random variable. 


Solution Notice that ру = О because Y cannot assume this value. Then 


Ра) = EW) = Y oq p- У) P 


у=1 у=1 
= їч! qD? + (40% +++. 


The terms in the series are those of an infinite geometric progression. If gt < 1, then 


Р 41 pt А 
Р(ї) = = : ft < 1/9. 
н Ger) 1—4! кке 
(For summation of the series, consult Appendix A1.11.) El 


EXAMPLE 3.27 Use P(t), Example 3.26, to find the mean of a geometric random variable. 


Solution From Theorem 3.13, uj; = и = P® (1). Using the result in Example 3.26, 


piga- a Qua 
~ dt\l—-qt} — (1 qt)? 


Setting t = 1, we obtain 


2 
1 
P + РЧ _ p+) _ | - 


poe 
р? p p 
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THEOREM 3.14 


Because we already have the moment-generating function to assist in finding the 
moments of a random variable, of what value is P(t)? The answer is that it may be 
difficult to find m(t) but much easier to find P(t). Thus, P (t) provides an additional 
tool for finding the moments of a random variable. It may or may not be useful in a 
given situation. 

Finding the moments of a random variable is not the major use of the probability- 
generating function. Its primary application is in deriving the probability function (and 
hence the probability distribution) for other related integer-valued random variables. 
For these applications, see Feller (1968) and Parzen (1992). 


Exercises 


Let Y denote a binomial random variable with n trials and probability of success p. Find the 
probability-generating function for Y and use it to find E(Y). 


Let Y denote a Poisson random variable with mean i. Find the probability-generating function 
for Y and use it to find E(Y) and V(Y). 


Refer to Exercise 3.165. Use the probability-generating function found there to find E(Y?). 


Tchebysheff’s Theorem 


We have seen in Section 1.3 and Example 3.22 that if the probability or population 
histogram is roughly bell-shaped and the mean and variance are known, the empirical 
rule is of great help in approximating the probabilities of certain intervals. However, 
in many instances, the shapes of probability histograms differ markedly from a mound 
shape, and the empirical rule may not yield useful approximations to the probabilities 
of interest. The following result, known as Tchebysheff’s theorem, can be used to 
determine a lower bound for the probability that the random variable Y of interest 
falls in an interval u + ko. 


Tchebysheff's Theorem Let Y be a random variable with mean u and finite 
variance o?. Then, for any constant k > 0, 


1 1 
PY =y swala шеа о. 


Two important aspects of this result should be pointed out. First, the result applies 
for any probability distribution, whether the probability histogram is bell-shaped or 
not. Second, the results of the theorem are very conservative in the sense that the 
actual probability that Y is in the interval и + ko usually exceeds the lower bound 
for the probability, 1 — 1/k?, by a considerable amount. However, as discussed in 
Exercise 3.169, for any k > 1, it is possible to construct a probability distribu- 
tion so that, for that k, the bound provided by Tchebysheff’s theorem is actually at- 
tained. (You should verify that the results of the empirical rule do not contradict those 


given by Theorem 3.14.) The proof of this theorem will be deferred to Section 4.10. 
The usefulness of this theorem is illustrated in the following example. 
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EXAMPLE 3.28 


Solution 


The number of customers per day at a sales counter, Y, has been observed for a long 
period of time and found to have mean 20 and standard deviation 2. The probability 
distribution of Y is not known. What can be said about the probability that, tomorrow, 
Y will be greater than 16 but less than 24? 


We want to find P(16 < Y < 24). From Theorem 3.14 we know that, for any k > 0, 
P(|Y — u| < ko) > 1 1702, or 


Ри — ko) < Y < (i ko] = 1- 5. 


Because и = 20 and o. = 2, it follows that u — ko = 16 and и + ko = 2A if k = 2. 
Thus, 
Р(16 < Y < 24) = P( 2 Y + 20) > 1 : ` 
= = = == 20: X < oO — — = —., 
H H = (2)2 A 
In other words, tomorrow’s customer total will be between 16 and 24 with a fairly 
high probability (at least 3/4). 
Notice that if o were 1, k would be 4, and 
1... 15 
(4 16 
Thus, the value of o has considerable effect on probabilities associated with intervals. 


Р(16 < Y < 24) = P(u — 40o < Y < u+ 40)> 1— 
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Exercises 


Let Y be a random variable with mean 11 and variance 9. Using Tchebysheff’s theorem, find 


a alower bound for P (6 < Y < 16). 
b the value of C such that P (|Y — 11| > C) < .09. 


Would you rather take a multiple-choice test or a full-recall test? If you have absolutely no 
knowledge of the test material, you will score zero on a full-recall test. However, if you are 
given 5 choices for each multiple-choice question, you have at least one chance in five of 
guessing each correct answer! Suppose that a multiple-choice exam contains 100 questions, 
each with 5 possible answers, and guess the answer to each of the questions. 


What is the expected value of the number Y of questions that will be correctly answered? 
Find the standard deviation of Y. 
Calculate the intervals u + 2e and u + Зо. 


aoa C» 


If the results of the exam are curved so that 50 correct answers is a passing score, are you 
likely to receive a passing score? Explain. 


This exercise demonstrates that, in general, the results provided by Tchebysheff’s theorem 
cannot be improved upon. Let Y be a random variable such that 


( i= ü=! dc 
PAM dac PU 
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a Show that E(Y) = Oand V(Y) = 1/9. 

b Use the probability distribution of Y to calculate P(|Y — u| > Зс). Compare this exact 
probability with the upper bound provided by Tchebysheff's theorem to see that the bound 
provided by Tchebysheff's theorem is actually attained when К = 3. 

*c In part (b) we guaranteed E(Y) = 0 by placing all probability mass on the values —1, 0, 
and 1, with p(—1) = р(1). The variance was controlled by the probabilities assigned 
to p(—1) and p(1). Using this same basic idea, construct a probability distribution for a 
random variable X that will yield P(|X — их| > 20x) = 1/4. 


*d Ifanyk > lisspecified, how can a random variable W be constructed so that P (| W – шуу | > 
kow) = 1/9 


3.170 Тһе U.S. mint produces dimes with an average diameter of .5 inch and standard deviation .01. 
Using Tchebysheff’s theorem, find a lower bound for the number of coins in a lot of 400 coins 
that are expected to have a diameter between .48 and .52. 


3.171  Foracertain type of soil the number of wireworms per cubic foot has a mean of 100. Assuming 
a Poisson distribution of wireworms, give an interval that will include at least 5/9 of the sample 
values of wireworm counts obtained from a large number of 1-cubic-foot samples. 


3.172 Refer to Exercise 3.115. Using the probability histogram, find the fraction of values in the 
population that fall within 2 standard deviations of the mean. Compare your result with that of 
Tchebysheff's theorem. 


3.173 A balanced coin is tossed three times. Let Y equal the number of heads observed. 


a Use the formula for the binomial probability distribution to calculate the probabilities 
associated with Y = 0, 1, 2, and 3. 


Construct a probability distribution similar to the one in Table 3.1. 


Find the expected value and standard deviation of Y, using the formulas E(Y) — np and 
V(Y) 2 npq. 

d Using the probability distribution from part (b), find the fraction of the population mea- 
surements lying within 1 standard deviation of the mean. Repeat for 2 standard deviations. 
How do your results compare with the results of Tchebysheff's theorem and the empirical 
rule? 


3.174 Suppose that a coin was definitely unbalanced and that the probability of a head was equal to 
р = .1. Follow instructions (а), (b), (c), and (d) as stated in Exercise 3.173. Notice that the 
probability distribution loses its symmetry and becomes skewed when p is not equal to 1/2. 


3.175 In May 2005, Tony Blair was elected to an historic third term as the British prime minister. 
A Gallop U.K. poll (http://gallup.com/poll/content/default.aspx?ci=1710, June 28, 2005) con- 
ducted after Blair’s election indicated that only 32% of British adults would like to see their son 
or daughter grow up to become prime minister. If the same proportion of Americans would pre- 
fer that their son or daughter grow up to be president and 120 American adults are interviewed, 


a what is the expected number of Americans who would prefer their child grow up to be 
president? 

b whatis the standard deviation of the number Y who would prefer that their child grow up 
to be president? 


c is it likely that the number of Americans who prefer that their child grow up to be president 
exceeds 40? 


3.176 А national poll of 549 teenagers (aged 13 to 17) by the Gallop poll (http://gallup.com/content/ 
default.aspx?ciz17110), April, 2005) indicated that 85% "think that clothes that display gang 
symbols" should be banned at school. If teenagers were really evenly split in their opinions 
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regarding banning of clothes that display gang symbols, comment оп the probability of ob- 
serving this survey result (that is, observing 85% or more in a sample of 549 who are in favor 
of banning clothes that display gang symbols). What assumption must be made about the sam- 
pling procedure in order to calculate this probability? [Hint: Recall Tchebysheff’s theorem and 
the empirical rule.] 


For a certain section of a pine forest, the number of diseased trees per acre, Y, has a Poisson 
distribution with mean A = 10. The diseased trees are sprayed with an insecticide at a cost of 
$3 per tree, plus a fixed overhead cost for equipment rental of $50. Letting C denote the total 
spraying cost for a randomly selected acre, find the expected value and standard deviation for 
C. Within what interval would you expect C to lie with probability at least .75? 


It is known that 10% of a brand of television tubes will burn out before their guarantee has 
expired. If 1000 tubes are sold, find the expected value and variance of У, the number of original 
tubes that must be replaced. Within what limits would Y be expected to fall? 


Refer to Exercise 3.91. In this exercise, we determined that the mean and variance of the costs 
necessary to find three employees with positive indications of asbestos poisoning were 150 and 
4500, respectively. Do you think it is highly unlikely that the cost of completing the tests will 
exceed $350? 


Summary 


This chapter has explored discrete random variables, their probability distributions, 
and their expected values. Calculating the probability distribution for a discrete ran- 
dom variable requires the use of the probabilistic methods of Chapter 2 to evaluate 
the probabilities of numerical events. Probability functions, p(y) = P(Y = y), 
were derived for binomial, geometric, negative binomial, hypergeometric, and Pois- 
son random variables. These probability functions are sometimes called probability 
mass functions because they give the probability (mass) assigned to each of the finite 
or countably infinite possible values for these discrete random variables. 

The expected values of random variables and functions of random variables pro- 
vided a method for finding the mean and variance of Y and consequently measures 
of centrality and variation for p(y). Much of the remaining material in the chapter 
was devoted to the techniques for acquiring expectations, which sometimes involved 
summing apparently intractable series. The techniques for obtaining closed-form ex- 
pressions for some of the resulting expected values included (1) use of the fact that 
2.5 p(y) = 1 for any discrete random variable and (2) E(Y?) = E[Y (Y - D]-- E(Y). 
The means and variances of several of the more common discrete distributions are 
summarized in Table 3.4. These results and more are also found in Table A2.1 in 
Appendix 2 and inside the back cover of this book. 

Table 3.5 gives the Ё (and S-Plus) procedures that yield p(yo) = P(Y = yo) 
and P(Y < yo) for random variables with binomial, geometric, negative binomial, 
hypergeometric, and Poisson distributions. 

We then discussed the moment-generating function associated with a random vari- 
able. Although sometimes useful in finding и and с, the moment-generating function 
is of primary value to the theoretical statistician for deriving the probability distribu- 
tion of arandom variable. The moment-generating functions for most of the common 
random variables are found in Appendix 2 and inside the back cover of this book. 
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Table 3.4 Means and variances for some common discrete random variables 


Distribution E(Y) V(Y) 
Binomial np np(l— p) = npq 
1 1— 
Geometric — Es = = 
P P Р 
Н і (4) (2) N -r N-n 
ergeometric п(— п 
ies N N/U N JAN-1 
Poisson À À 
j 
Negative binomial E DAS = "d 
Р p p 


Table 3.5 R (and S-Plus) procedures giving probabilities for some common discrete distributions 


Distribution P(Y = yo) = po) P(Y < yo) 

Binomial dbinom(yo,n,p) pbinom(yo,n,p) 
Geometric dgeom(yo-1,p) pgeom(yo-1,p) 
Hypergeometric dhyper(yo,r,N-r,n) phyper(yo,r,N-r,n) 
Poisson dpois (yo, À) ppois (уо, А) 

Negative binomial dnbinom(yo-r,r,p) pnbinom(yo-r,r,p) 


The probability-generating function is a useful device for deriving moments and 
probability distributions of integer-valued random variables. 

Finally, we gave Tchebysheff's theorem a very useful result that permits approxi- 
mating certain probabilities when only the mean and variance are known. 

To conclude this summary, we recall the primary objective of statistics: to make 
an inference about a population based on information contained in a sample. Draw- 
ing the sample from the population is the experiment. The sample is often a set of 
measurements of one or more random variables, and it is the observed event resulting 
from a single repetition of the experiment. Finally, making the inference about the 
population requires knowledge of the probability of occurrence of the observed sam- 
ple, which in turn requires knowledge of the probability distributions of the random 
variables that generated the sample. 
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Supplementary Exercises 


Four possibly winning numbers for a lottery —AB-4536, NH-7812, 50-7855, and ZY-3221— 
arrive in the mail. You will win a prize if one of your numbers matches one of the winning 
numbers contained on a list held by those conducting the lottery. One first prize of $100,000, 
two second prizes of $50,000 each, and ten third prizes of $1000 each will be awarded. To be 
eligible to win, you need to mail the coupon back to the company at a cost of 33¢ for postage. 
No purchase is required. From the structure of the numbers that you received, it is obvious the 
numbers sent out consist of two letters followed by four digits. Assuming that the numbers 
you received were generated at random, what are your expected winnings from the lottery? Is 
it worth 33¢ to enter this lottery? 


Sampling for defectives from large lots of manufactured product yields a number of defectives, 
Y, that follows a binomial probability distribution. A sampling plan consists of specifying the 
number of items л to be included in a sample and an acceptance number a. The lot is accepted 
if Y < a and rejected if Y > a. Let p denote the proportion of defectives in the lot. For n = 5 
and a = 0, calculate the probability of lot acceptance if (a) р = 0, (b) p = .1, (c) р = .3, 
(d) p = .5, (e) p = 1.0. A graph showing the probability of lot acceptance as a function of lot 
fraction defective is called the operating characteristic curve for the sample plan. Construct 
the operating characteristic curve for the plan n — 5, a — 0. Notice that a sampling plan is an 
example of statistical inference. Accepting or rejecting a lot based on information contained in 
the sample is equivalent to concluding that the lot is either good or bad. “Good” implies that a 
low fraction is defective and that the lot is therefore suitable for shipment. 


Refer to Exercise 3.181. Use Table 1, Appendix 3, to construct the operating characteristic 
curves for the following sampling plans: 


a n=10,a=0. 
b п=10,а=1. 
c n=10,a=2. 


For each sampling plan, calculate P(lot acceptance) for p = 0, .05, .1, .3, .5, and 1.0. Our 
intuition suggests that sampling plan (a) would be much less likely to accept bad lots than 
plans (b) and (c). A visual comparison of the operating characteristic curves will confirm this 
intuitive conjecture. 
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A quality control engineer wishes to study alternative sampling plans: п = 5, a = 1 and 
n — 25,a — 5. On a sheet of graph paper, construct the operating characteristic curves for 
both plans, making use of acceptance probabilities at p = .05, p = .10, p = .20, p = .30, 
and p — .40 in each case. 


a If you were a seller producing lots with fraction defective ranging from p = 0 to p = .10, 
which of the two sampling plans would you prefer? 

b If you were a buyer wishing to be protected against accepting lots with fraction defective 
exceeding p = .30, which of the two sampling plans would you prefer? 


A city commissioner claims that 80% of the people living in the city favor garbage collection 
by contract to a private company over collection by city employees. To test the commissioner’s 
claim, 25 city residents are randomly selected, yielding 22 who prefer contracting to a private 
company. 


a Ifthe commissioner’s claim is correct, what is the probability that the sample would contain 
at least 22 who prefer contracting to a private company? 

b If the commissioner’s claim is correct, what is the probability that exactly 22 would prefer 
contracting to a private company? 

с Based on observing 22 in a sample of size 25 who prefer contracting to a private company, 
what do you conclude about the commissioner's claim that 80% of city residents prefer 
contracting to a private company? 


Twenty students are asked to select an integer between 1 and 10. Eight choose either 4, 5 or 6. 


a If the students make their choices independently and each is as likely to pick one integer 
as any other, what is the probability that 8 or more will select 4,5 or 6? 

b Having observed eight students who selected 4, 5, or 6, what conclusion do you draw based 
on your answer to part (a)? 


Refer to Exercises 3.67 and 3.68. Let Y denote the number of the trial on which the first 
applicant with computer training was found. If each interview costs $30, find the expected 
value and variance of the total cost incurred interviewing candidates until an applicant with 
advanced computer training is found. Within what limits would you expect the interview costs 
to fall? 


Consider the following game: A player throws a fair die repeatedly until he rolls a 2, 3, 4, 5, or 
6. In other words, the player continues to throw the die as long as he rolls 1s. When he rolls a 
“поп-1 he stops. 


a Whatis the probability that the player tosses the die exactly three times? 

What is the expected number of rolls needed to obtain the first non-1? 

If he rolls a non-1 on the first throw, the player is paid $1. Otherwise, the payoff is doubled 
for each 1 that the player rolls before rolling a non-1. Thus, the player is paid $2 if he rolls 
a 1 followed by a non-1; $4 if he rolls two Is followed by a non-1; $8 if he rolls three 1s 
followed by a non-1; etc. In general, if we let Y be the number of throws needed to obtain 
the first non-1, then the player rolls (Y — 1) 1s before rolling his first non-1, and he is paid 
2*-! dollars. What is the expected amount paid to the player? 


If Y is a binomial random variable based on n trials and success probability p, show that 


(1— р)" — np( — p)! 


1 
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A starter motor used in a space vehicle has a high rate of reliability and was reputed to start on 
any given occasion with probability .99999. What is the probability of at least one failure in 
the next 10,000 starts? 


Refer to Exercise 3.115. Find џи, the expected value of Y, for the theoretical population by 
using the probability distribution obtained in Exercise 3.115. Find the sample mean y for the 
n = 100 measurements generated in Exercise 3.116. Does y provide a good estimate of и? 


Find the population variance o? for Exercise 3.115 and the sample variance s? for Exercise 
3.116. Compare. 


Toss a balanced die and let Y be the number of dots observed on the upper face. Find the mean 
and variance of У. Construct a probability histogram, and locate the interval и + 20. Verify 
that Tchebysheff's theorem holds. 


Two assembly lines I and II have the same rate of defectives in their production of voltage 
regulators. Five regulators are sampled from each line and tested. Among the total of ten tested 
regulators, four are defective. Find the probability that exactly two of the defective regulators 
came from line I. 


One concern of a gambler is that she will go broke before achieving her first win. Suppose that 
she plays a game in which the probability of winning is .1 (and is unknown to her). It costs her 
$10 to play and she receives $80 for a win. If she commences with $30, what is the probability 
that she wins exactly once before she loses her initial capital? 


The number of imperfections in the weave of a certain textile has a Poisson distribution with 
a mean of 4 per square yard. Find the probability that a 


a l-square-yard sample will contain at least one imperfection. 


b 3-square-yard sample will contain at least one imperfection. 


Refer to Exercise 3.195. The cost of repairing the imperfections in the weave is $10 per 
imperfection. Find the mean and standard deviation of the repair cost for an 8-square-yard bolt 
of the textile. 


The number of bacteria colonies of a certain type in samples of polluted water has a Poisson 
distribution with a mean of 2 per cubic centimeter (cm?). 


a If four 1-cm? samples are independently selected from this water, find the probability that 
at least one sample will contain one or more bacteria colonies. 

b How many 1-cm? samples should be selected in order to have a probability of approximately 
.95 of seeing at least one bacteria colony? 


One model for plant competition assumes that there is a zone of resource depletion around 
each plant seedling. Depending on the size of the zones and the density of the plants, the zones 
of resource depletion may overlap with those of other seedlings in the vicinity. When the seeds 
are randomly dispersed over a wide area, the number of neighbors that any seedling has within 
an area of size A usually follows a Poisson distribution with mean equal to A x d, where d is 
the density of seedlings per unit area. Suppose that the density of seedlings is four per square 
meter. What is the probability that a specified seeding has 


a no neighbors within 1 meter? 


b at most three neighbors within 2 meters? 


Insulin-dependent diabetes (IDD) is acommon chronic disorder in children. The disease occurs 
most frequently in children of northern European descent, but the incidence ranges from a low 
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3.200 


3.201 


*3.202 


3.203 


3.204 


3.205 


of 1—2 cases per 100,000 per year to a high of more than 40 cases per 100,000 in parts of 
Finland.* Let us assume that a region in Europe has an incidence of 30 cases per 100,000 per 
year and that we randomly select 1000 children from this region. 


a Сап the distribution of the number of cases of IDD among those in the sample be approx- 
imated by a Poisson distribution? If so, what is the mean of the approximating Poisson 
distribution? 

b What is the probability that we will observe at least two cases of IDD among the 1000 
children in the sample? 


Using the fact that 


= Ыы Б LE. 
ES ee a 


expand the moment-generating function for the binomial distribution 
m(t) = (q + ре)" 


into a power series in г. (Acquire only the low-order terms in Т.) Identify ш; as the coefficient 
of t'/i! appearing in the series. Specifically, find и, and u, and compare them with the results 
of Exercise 3.146. 


Refer to Exercises 3.103 and 3.106. In what interval would you expect the repair costs on these 
five machines to lie? (Use Tchebysheff’s theorem.) 


The number of cars driving past a parking area in a one-minute time interval has a Poisson 
distribution with mean A. The probability that any individual driver actually wants to park his 
or her car is p. Assume that individuals decide whether to park independently of one another. 


a If one parking place is available and it will take you one minute to reach the parking area, 
what is the probability that a space will still be available when you reach the lot? (Assume 
that no one leaves the lot during the one-minute interval.) 

b Let W denote the number of drivers who wish to park during a one-minute interval. Derive 
the probability distribution of W. 


A type of bacteria cell divides at a constant rate A over time. (That is, the probability that a cell 
divides in a small interval of time t is approximately At.) Given that a population starts out at 
time zero with К cells of this bacteria and that cell divisions are independent of one another, 
the size of the population at time г, Y (t), has the probability distribution 


n—l1 


P[Y(t) 2n]— i 


e" Gey", n=k,k+1,.... 


a Find the expected value and variance of Y (t) in terms of A апат. 


b If, for a type of bacteria cell, А = .1 per second and the population starts out with two cells 
at time zero, find the expected value and variance of the population after five seconds. 


The probability that any single driver will turn left at an intersection is .2. The left turn lane at 
this intersection has room for three vehicles. If the left turn lane is empty when the light turns 
red and five vehicles arrive at this intersection while the light is red, find the probability that 
the left turn lane will hold the vehicles of all of the drivers who want to turn left. 


An experiment consists of tossing a fair die until a 6 occurs four times. What is the probability 
that the process ends after exactly ten tosses with a 6 occurring on the ninth and tenth tosses? 


4. M. A. Atkinson,“Diet, Genetics, and Diabetes,” Food Technology 51(3), (1997): 77. 
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Accident records collected by an automobile insurance company give the following informa- 
tion. The probability that an insured driver has an automobile accident is .15. If an accident has 
occurred, the damage to the vehicle amounts to 20% of its market value with a probability of 
.80, to 60% of its market value with a probability of .12, and to a total loss with a probability 
of .08. What premium should the company charge on a $12,000 car so that the expected gain 
by the company is zero? 


The number of people entering the intensive care unit at a hospital on any single day possesses 
а Poisson distribution with a mean equal to five persons per day. 


a What is the probability that the number of people entering the intensive care unit on a 
particular day is equal to 2? Is less than or equal to 2? 


b Isitlikely that Y will exceed 10? Explain. 


A recent survey suggests that Americans anticipate a reduction in living standards and that a 
steadily increasing level of consumption no longer may be as important as it was in the past. 
Suppose that a poll of 2000 people indicated 1373 in favor of forcing a reduction in the size of 
American automobiles by legislative means. Would you expect to observe as many as 1373 in 
favor of this proposition if, in fact, the general public was split 50—50 on the issue? Why? 


A supplier of heavy construction equipment has found that new customers are normally obtained 
through customer requests for a sales call and that the probability of a sale of a particular piece 
of equipment is .3. If the supplier has three pieces of the equipment available for sale, what is 
the probability that it will take fewer than five customer contacts to clear the inventory? 


Calculate P(|Y — à| < 20) for the Poisson probability distribution of Example 3.22. Does this 
agree with the empirical rule? 


A merchant stocks a certain perishable item. She knows that on any given day she will have a 
demand for either two, three, or four of these items with probabilities .1, .4, and .5, respectively. 
She buys the items for $1.00 each and sells them for $1.20 each. If any are left at the end of 
the day, they represent a total loss. How many items should the merchant stock in order to 
maximize her expected daily profit? 


Show that the hypergeometric probability function approaches the binomial in the limit as 
N — oo and p = r/N remains constant. That is, show that 


r\(N-r 
im OCD (ue 


NESS G) у 


for р = r/N constant. 


Alotof N = 100 industrial products contains 40 defectives. Let Y be the number of defectives in 
arandom sample of size 20. Find p(10) by using (a) the hypergeometric probability distribution 
and (b) the binomial probability distribution. Is N large enough that the value for p(10) 
obtained from the binomial distribution is a good approximation to that obtained using the 
hypergeometric distribution? 


For simplicity, let us assume that there are two kinds of drivers. The safe drivers, who are 70% 
of the population, have probability .1 of causing an accident іп a year. The rest of the population 
are accident makers, who have probability .5 of causing an accident in a year. The insurance 
premium is $400 times one’s probability of causing an accident in the following year. A new 
subscriber has an accident during the first year. What should be his insurance premium for the 
next year? 
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Itis known that 596 of the members of a population have disease A, which can be discovered 
by a blood test. Suppose that N (a large number) people are to be tested. This can be done in 
two ways: (1) Each person is tested separately, or (2) the blood samples of k people are pooled 
together and analyzed. (Assume that N = nk, with n an integer.) If the test is negative, all 
of them are healthy (that is, just this one test is needed). If the test is positive, each of the К 
persons must be tested separately (that is, a total of k + 1 tests are needed). 


a For fixed k, what is the expected number of tests needed in option 2? 
b Findthe К that will minimize the expected number of tests in option 2. 


с Ifkis selected as in part (b), on the average how many tests does option 2 save in comparison 
with option 1? 


Let Y have a hypergeometric distribution 
(205) 
ру) = ~n 
e 


=1 =й -n4l 
Pa =n) = р) = (5) (=) (325) (к=к): 


b Write p(y) as p(y|r). Show that if гү < r2, then 


, yc, 1, Жуму 1 


a Show that 


pOlr) Е pO + dri) 
pOlr) pO + dr) 


с Apply the binomial expansion to each factor in the following equation: 


(+ а)“ (1+ а)? = (1+а)“%“*%. 


Now compare the coefficients of a” on both sides to prove that 


Nı\ (N2 + М, № dicia Ni\(N2\ (Ni No 
0 n 1 п—1 п 0/7 n ` 
d Using the result of part (c), conclude that 


Уу po) = 1. 
y=0 


Use the result derived in Exercise 3.216(c) and Definition 3.4 to derive directly the mean of a 
hypergeometric random variable. 


Use the results of Exercises 3.216(c) and 3.217 to show that, for a hypergeometric random 
variable, 


EYY —1)]= ou 1) 
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Introduction 


A moment of reflection on random variables encountered in the real world should 
convince you that not all random variables of interest are discrete random variables. 
The number of days that it rains in a period of п days is a discrete random variable 
because the number of days must take one of the n + 1 values 0, 1, 2,..., or n. 
Now consider the daily rainfall at a specified geographical point. Theoretically, with 
measuring equipment of perfect accuracy, the amount of rainfall could take on any 
value between 0 and 5 inches. As a result, each of the uncountably infinite number 
of points in the interval (0, 5) represents a distinct possible value of the amount of 
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4.2 


DEFINITION 4.1 


rainfall in a day. A random variable that can take on any value in an interval is called 
continuous, and the purpose of this chapter is to study probability distributions for 
continuous random variables. The yield of an antibiotic in a fermentation process is 
a continuous random variable, as is the length of life, in years, of a washing machine. 
The line segments over which these two random variables are defined are contained 
in the positive half of the real line. This does not mean that, if we observed enough 
washing machines, we would eventually observe an outcome corresponding to every 
value in the interval (3, 7); rather it means that no value between 3 and 7 can be ruled 
out as as a possible value for the number of years that a washing machine remains in 
service. 

The probability distribution for a discrete random variable can always be given by 
assigning a nonnegative probability to each of the possible values the variable may 
assume. In every case, of course, the sum of all the probabilities that we assign must be 
equalto 1. Unfortunately, the probability distribution for a continuous random variable 
cannot be specified in the same way. It is mathematically impossible to assign nonzero 
probabilities to all the points on a line interval while satisfying the requirement that 
the probabilities of the distinct possible values sum to 1. As a result, we must develop 
a different method to describe the probability distribution for a continuous random 
variable. 


The Probability Distribution 
for a Continuous Random Variable 


Before we can state a formal definition for a continuous random variable, we must 
define the distribution function (or cumulative distribution function) associated with 
a random variable. 


Let Y denote any random variable. The distribution function of Y , denoted by 
F(y),is such that F (y) = P(Y < y) for —co < у < оо. 


The nature of the distribution function associated with a random variable deter- 
mines whether the variable is continuous or discrete. Consequently, we will commence 
our discussion by examining the distribution function for a discrete random variable 
and noting the characteristics of this function. 


EXAMPLE 4.1 


Solution 


Suppose that Y has a binomial distribution with n — 2 and p — 1/2. Find F(y). 


The probability function for Y is given by 


E CI үгү? ee 
wore Q^. эе 


р(0) = 1/4, р) = 1/2, р() = 1/4. 


which yields 


FIGURE 4.1 
Binomial distribution 
function, 

п= 2, р= 1/2 
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1/2 — 


What is F(—2) = P(Y < —2)? Because the only values of Y that are assigned 
positive probabilities are 0, 1, and 2 and none of these values are less than or equal 
to —2, F(—2) = 0. Using similar logic, F(y) = 0 for all y < 0. What is F(1.5)? 
The only values of Y that are less than or equal to 1.5 and have nonzero probabilities 
are the values О and 1. Therefore, 


Е(1.5) = P(Y < 1.5) = P(Y 20) + PU = 1) 
= (1/4) + (1/2) = 3/4. 


In general, 
0, for y < 0, 
1/4, forO<y <1, 
Е(у) = PU < у) = 
3/4, forl<y <2, 
1, for y > 2. 
A graph of F(y) is given in Figure 4.1. 


In Example 4.1 the points between 0 and 1 or between 1 and 2 all had probability 0 
and contributed nothing to the cumulative probability depicted by the distribution 
function. As a result, the cumulative distribution function stayed flat between the 
possible values of Y and increased in jumps or steps at each of the possible values 
of Y. Functions that behave in such a manner are called step functions. Distribution 
functions for discrete random variables are always step functions because the cumu- 
lative distribution function increases only at the finite or countable number of points 
with positive probabilities. 

Because the distribution function associated with any random variable is such 
that F(y) = P(Y < y), from a practical point of view it is clear that F(—oo) = 
іту 9; P(Y < y) must equal zero. If we consider any two values у < y», then 
P(Y < yj) € P(Y < y2)—thatis, F (y1) < F(y;). So, a distribution function, F (y), 
is always a monotonic, nondecreasing function. Further, it is clear that F(oo) — 
lim, ,4,;P(Y < y) = 1. These three characteristics define the properties of any 
distribution function and are summarized in the following theorem. 
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FIGURE 4.2 
Distribution function 
for a continuous 
random variable 


Properties of a Distribution Function! If F(y) is a distribution function, then 
. F(—oo) = lim F(y) = 0. 
2, sese fta е 
3. F(y)isa к іа function of y. [If y; and у» are any values such 
that yj < y», then F(yi) < F@2).] 


You should check that the distribution function developed in Example 4.1 has each 
of these properties. 

Let us now examine the distribution function for a continuous random variable. 
Suppose that, for all practical purposes, the amount of daily rainfall, Y, must be 
less than 6 inches. For every 0 < y; < y» < 6, the interval (уу, у) has a positive 
probability of including Y, no matter how close у gets to y». It follows that F(y) in 
this case should be a smooth, increasing function over some interval of real numbers, 
as graphed in Figure 4.2. 

We are thus led to the definition of a continuous random variable. 


A random variable Y with distribution function F (y) is said to be continuous 
2 


if F (y) is continuous, for оо < y < oo. 


F(y) 


F(y) 


1. To be mathematically rigorous, if F(y) is a valid distribution function, then F(y) also must be right 
continuous. 

2. To be mathematically precise, we also need the first derivative of F (y) to exist and be continuous except 
for, at most, a finite number of points in any finite interval. The distribution functions for the continuous 
random variables discussed in this text satisfy this requirement. 


DEFINITION 4.3 


FIGURE 4.3 
The distribution 
function 
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If Y is a continuous random variable, then for any real number y, 
P(Y = y) = 0. 


If this were not true апа P(Y = yo) = po > 0, then F (у) would have a discontinuity 
(jump) of size po at the point yo, violating the assumption that Y was continuous. 
Practically speaking, the fact that continuous random variables have zero probability 
at discrete points should not bother us. Consider the example of measuring daily 
rainfall. What is the probability that we will see a daily rainfall measurement of 
exactly 2.193 inches? It is quite likely that we would never observe that exact value 
even if we took rainfall measurements for a lifetime, although we might see many 
days with measurements between 2 and 3 inches. 

The derivative of F(y) is another function of prime importance in probability 
theory and statistics. 


Let F (у) be the distribution function for a continuous random variable Y. Then 
f(y), given by 
dF(y) 
dy 
wherever the derivative exists, is called the probability density function for the 
random variable У. 


Jy) Ey) 


It follows from Definitions 4.2 and 4.3 that F(y) can be written as 


F(y) = | f(t) dt, 


where f(-) is the probability density function and ft is used as the variable of in- 
tegration. The relationship between the distribution and density functions is shown 
graphically in Figure 4.3. 

The probability density function is a theoretical model for the frequency distri- 
bution (histogram) of a population of measurements. For example, observations of 
the lengths of life of washers of a particular brand will generate measurements that 
can be characterized by a relative frequency histogram, as discussed in Chapter 1. 
Conceptually, the experiment could be repeated ad infinitum, thereby generating a 
relative frequency distribution (a smooth curve) that would characterize the popu- 
lation of interest to the manufacturer. This theoretical relative frequency distribu- 
tion corresponds to the probability density function for the length of life of a single 
machine, Y. 


f(y) 


FO) 
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Because the distribution function F(y) for any random variable always has the 
properties given in Theorem 4.1, density functions must have some corresponding 
properties. Because F(y) is a nondecreasing function, the derivative f(y) is never 
negative. Further, we know that F (oo) = 1 and, therefore, that = f (t) dt = 1. In 
summary, the properties of a probability density function are as given in the following 
theorem. 


Properties of a Density Function If f (y) is a density function for a continuous 
random variable, then 


i, jf) = Отоа 3o, —69 < yp < €9. 
2. f fo dy = 1. 


The next example gives the distribution function and density function for a 
continuous random variable. 


EXAMPLE 4.2 


Solution 


FIGURE 4.4 
Distribution function 
Е (у) for Example 4.2 


Suppose that 
0, for y <0, 
Е(у) = jy, forO<y<1, 
1, fory>1. 


Find the probability density function for Y and graph it. 


Because the density function f (y) is the derivative of the distribution function F (y), 
when the derivative exists, 


d(0 
eO i for y « 0, 
dy 

dF d 

о ү юу 

ау ау 
аа) 
—— —0, fory>l, 
dy 


and f(y) is undefined at y = 0 and y = 1. A graph of F(y) is shown in Figure 4.4. 


F(y) 
1 


The graph of f(y) for Example 4.2 is shown in Figure 4.5. Notice that the dis- 
tribution and density functions given in Example 4.2 have all the properties required 


FIGURE 4.5 
Density function 
f(y) for Example 4.2 


4.2 The Probability Distribution for a Continuous Random Variable 163 


fo) 
1 


—— MÀ L 
0 1 y 


of distribution and density functions, respectively. Moreover, F (y) is a continuous 
function of y, but f (y) is discontinuous at the points y = 0, 1. In general, the distri- 
bution function for a continuous random variable must be continuous, but the density 
function need not be everywhere continuous. 


EXAMPLE 4.3 


Solution 


FIGURE 4.6 
Density function 
for Example 4.3 


Let Y be a continuous random variable with probability density function given by 
ay, Sy Si, 
ЈО) = | 
0, elsewhere. 
Find F(y). Graph both f(y) and F(y). 


The graph of f(y) appears in Figure 4.6. Because 
y 
Pie f f(t) dt, 
—0o 


we have, for this example, 
y 
fi, 0dt 2:0, for y « 0, 


Е(у) = /° „оа + fp 32 dt =0+ al =y, ѓог0 < у <1, 
904+ 31324: + f? 0d: = 0+ P] +0=1, forl<y. 


Notice that some of the integrals that we evaluated yield a value of 0. These are 
included for completeness in this initial example. In future calculations, we will 
not explicitly display any integral that has value 0. The graph of F(y) is given in 
Figure 4.7. 


f(y) 
3L 
2. L— 
1L 


F (yo) gives the probability that Y < yo. As you will see in subsequent chapters, it 
is often of interest to determine the value, y, of a random variable Y that is such that 
P(Y < y) equals or exceeds some specified value. 
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FIGURE 4.7 
Distribution function 
for Example 4.3 


DEFINITION 4.4 


THEOREM 4.3 


FIGURE 4.8 
P(a € Y x b) 


F(y) 


Let Y denote any random variable. If 0 < p < 1, the pth quantile of Y, 
denoted by $,, is the smallest value such that P(Y < ф,) = Е(ф,) = р. UY 
is continuous, ф is the smallest value such that F(p) = P(Y < $,) = p. 
Some prefer to call ф, the 100pth percentile of Y. 


An important special case is р = 1/2, and $ 5 is the median of the random variable 
Y. In Example 4.3, the median of the random variable is such that F($.5) = .5 and 
is easily seen to be such that (#5)? = .5, or equivalently, that the median of Y is 
ф»5 = (5)!/5 = 7937. 

The next step is to find the probability that Y falls in a specific interval; that is, 
P(a < Y < b). From Chapter 1 we know that this probability corresponds to the area 
under the frequency distribution over the interval a < y < b. Because f(y) is the 
theoretical counterpart of the frequency distribution, we would expect P (a < Y < b) 
to equal a corresponding area under the density function f(y). This indeed is true 
because, ifa < b, 


b 
Р(а<Ү <Б) = PY <b)— PW <a) =F- Fa = | f Cy) dy. 


Because P(Y = a) = 0, we have the following result. 


If the random variable Y has density function f (y) and a < b, then the proba- 
bility that Y falls in the interval [a, 5] is 


b 
P(a € Y <b) =| pene 


This probability is the shaded area in Figure 4.8. 


f(y) 
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If Y is a continuous random variable and a and b are constants such that a < b, 
then P(Y = a) = 0 and P(Y = b) = О and Theorem 4.3 implies that 
P(a «Y <Б) = Р(а=<Ү <Б) = Р(а < Ү <b) 


b 
СЕЕ] f(y) dy. 


The fact that the above string of equalities is not, in general, true for discrete random 
variables is illustrated in Exercise 4.7. 


EXAMPLE 4.4 


Solution 


Given f(y) = cy”, 0 < y < 2, and f(y) = 0 elsewhere, find the value of c for which 
f Су) is a valid density function. 


We require a value for c such that 


bises f j6 di 


2 332 
2 Cy 8 
== су” dy = —— | = (3)e 
] эһ \з 


Thus, (8/3)с = 1, and we find that с = 3/8. E 


EXAMPLE 4.5 


Solution 


Find P(1 < Y < 2) for Example 4.4. Also find P(1 < Y < 2). 


2 dg. 3A уз 5. 
pas¥s2=f fo dy== f y w= (3) | =F 


Because Y has a continuous distribution, it follows that P(Y = 1) = P(Y = 2) 20 
and, therefore, that 


з р? 7 
Р(1< У <2) = Р(1<Ү<2) = > 2 dy = -. 
(l< F <2) (1<= 7 <2) TIED 8 ш 


Probability statements regarding a continuous random variable Y are meaningful 
only if, first, the integral defining the probability exists and, second, the resulting 
probabilities agree with the axioms of Chapter 2. These two conditions will always 
be satisfied if we consider only probabilities associated with a finite or countable 
collection of intervals. Because we almost always are interested in probabilities that 
continuous variables fall in intervals, this consideration will cause us no practical diffi- 
culty. Some density functions that provide good models for population frequency dis- 
tributions encountered in practical applications are presented in subsequent sections. 
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Exercises 


Let Y be a random variable with p(y) given in the table below. 


y | 1 2 3 4 
p(y) | 4 S: 2. d 


a Give the distribution function, F (y). Be sure to specify the value of F (y) for all y, —oo < 
y < oo. 


b Sketch the distribution function given in part (a). 


A box contains five keys, only one of which will open a lock. Keys are randomly selected and 
tried, one at a time, until the lock is opened (keys that do not work are discarded before another 
is tried). Let Y be the number of the trial on which the lock is opened. 


Find the probability function for У. 

Give the corresponding distribution function. 

What 15 P(Y < 3)? P(Y < 3? P(Y = 3)? 

If Y is a continuous random variable, we argued that, for all—oo < a < oo, P(Y = a) = 0. 
Do any of your answers in part (c) contradict this claim? Why? 


с с C $9 


A Bernoulli random variable is one that assumes only two values, 0 and 1 with p(1) — p and 
100) =1-р=4. 

а Sketch the corresponding distribution function. 

b Show that this distribution function has the properties given in Theorem 4.1. 


Let Y be a binomial random variable with п = 1 and success probability р. 


a Find the probability and distribution function for Y. 


b Compare the distribution function from part (a) with that in Exercise 4.3(a). What do you 
conclude? 


Suppose that Y isarandom variable that takes on only integer values 1, 2, . . . and has distribution 
function F(y). Show that the probability function p(y) — P(Y — y) is given by 


F(1), y=1, 
р(у) = 
Е(у) – Е(у = 1), у= 2, 3,.... 
Consider a random variable with a geometric distribution (Section 3.5); that is, 


p) = 4р, ysL2,3,...0«p-«L 


a Show that Y has distribution function F(y) such that F(i) = 1 — q', i —0,1,2,... and 
that, in general, 
0, у<0, 
F) = | m А 
1-4, ї<у<1+1, fori =0,1,2,.... 


b Show that the preceding cumulative distribution function has the properties given in 
Theorem 4.1. 


Let Y be a binomial random variable with n = 10 and p = .2. 


a Use Table 1, Appendix 3, to obtain P(2 < Y < 5) and P(2 < Y <5). Are the probabilities 
that Y falls in the intevals (2, 5) and [2, 5) equal? Why or why not? 


4.8 


4.9 


4.10 


4.12 
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b Use Table 1, Appendix 3, to obtain P(2 < Y < 5) and P(2 < Y < 5). Are these two 
probabilities equal? Why or why not? 


c Earlier in this section, we argued that if Y is continuous anda < b, then P(a < Y < b) = 
P(a < Y < Б). Does the result in part (a) contradict this claim? Why? 


Suppose that Y has density function 
уо) = ү O<ys<l, 
0, elsewhere. 
a Find the value of k that makes f(y) a probability density function. 
b Find P(.4 x Y <1). 
c Find P(.4 € Y <1). 
d Find P(Y x .4|Y < .8). 
e Find P(Y < .4|Y < .8). 
A random variable Y has the following distribution function: 
0: for y < 2, 
1/8, for 2 € y < 2.5, 
3/16, for2.5<y <4, 
Е(у) = P(Y < у) = { 1/2 for 4 < y < 5.5, 
5/8, for5.5 < y <6, 
11/16, for6<y <7, 
l; for y > 7. 
Is Y a continuous or discrete random variable? Why? 
What values of Y are assigned positive probabilities? 
Find the probability function for Y. 
What is the median, $5, of Y? 


© с cC» 


Refer to the density function given in Exercise 4.8. 


a Find the .95-quantile, $ 95, such that P(Y < фо») = .95. 
b Find a value yo so that P(Y < yo) = .95. 


Compare the values for фоѕ and yo that you obtained in parts (a) and (b). Explain the 
relationship between these two values. 


Suppose that Y possesses the density function 
cy, O<y<2, 
70) = | 0, elsewhere. 
Find the value of c that makes f(y) a probability density function. 
Find F(y). 
Graph f(y) and F(y). 
Use F(y) to find P(1 < Y x 2). 
Use f (y) and geometry to find P(1 < Y < 2). 


ооо Cc » 


The length of time to failure (in hundreds of hours) for a transistor is a random variable Y with 
distribution function given by 


y <0, 


0 
ғо) = | 


2 
l-e”, у> 0. 
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4.13 


4.14 


4.15 


4.16 


Show that F (y) has the properties of a distribution function. 

Find the .30-quantile, $ 30, of У. 

Find f (y). 

Find the probability that the transistor operates for at least 200 hours. 
Find P(Y > 100|Y < 200). 


ооо C $9 


A supplier of kerosene has a 150-gallon tank that is filled at the beginning of each week. His 
weekly demand shows a relative frequency behavior that increases steadily up to 100 gallons 
and then levels off between 100 and 150 gallons. If Y denotes weekly demand in hundreds of 
gallons, the relative frequency of demand can be modeled by 
у, О<у<1, 
ХУ) =} 1, L<y<15, 


0, elsewhere. 


IA 


a Find F(y). 

b Find P(0 < Y <.5). 

c Find P(.5 < У < 1.2). 

A gas station operates two pumps, each of which can pump up to 10,000 gallons of gas in 


a month. The total amount of gas pumped at the station in a month is a random variable Y 
(measured in 10,000 gallons) with a probability density function given by 


y, О<у<1, 
ЈО) = }2-у, 1<у<2, 
0, elsewhere. 


a Graph f(y). 
Find F(y) and graph it. 
Find the probability that the station will pump between 8000 and 12,000 gallons in a 
particular month. 


d Given that the station pumped more than 10,000 gallons in a particular month, find the 
probability that the station pumped more than 15,000 gallons during the month. 


As a measure of intelligence, mice are timed when going through a maze to reach a reward 
of food. The time (in seconds) required for any mouse is a random variable Y with a density 
function given by 
b 
foi» 


0, elsewhere, 


-b, 


where b is the minimum possible time needed to traverse the maze. 


a Show that f (y) has the properties of a density function. 
b Find F(y). 
с Find P(Y > b + c) for a positive constant c. 
d Ifc and d are both positive constants such that d > c, find P(Y > b +d]|Y >b+c). 
Let Y possess a density function 
с(2- у), O<y<2, 
ЈО) = 


0, elsewhere. 
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4.19 
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Find c. 

Find F(y). 

Graph f(y) and F(y). 

Use F(y) in part (b) to find P(1 < Y < 2). 

Use geometry and the graph for f (y) to calculate P(1 < Y < 2). 


® с сос cC $» 


The length of time required by students to complete a one-hour exam is a random variable with 
a density function given by 


cy +y, О<у<1, 
го)= [ 


0, elsewhere. 

Find c. 

Find F(y). 

Graph f(y) and F(y). 

Use F (y) in part (b) to find F(—1), F(0), and F(1). 

Find the probability that a randomly selected student will finish in less than half an hour. 


-— © 00 c» 


Given that a particular student needs at least 15 minutes to complete the exam, find the 
probability that she will require at least 30 minutes to finish. 


Let Y have the density function given by 


2, -l<y<0O, 
О) = \.2+су, O<y<l, 
0, elsewhere. 
Find c. 
Find F(y). 


Graph f(y) and F(y). 

Use F (y) in part (b) to find F(—1), F(0), and F(1). 
Find P(0 < Y <.5). 

Find P(Y > .5|Y > .1). 


oan c» 


Let the distribution function of a random variable Y be 


0 ух<об, 

: 0 «y <2, 
Е(у) = у? 

—, 2<y <4, 

16' “^7 

1; у> 4. 


Find the density function of Y. 
Find P(1 < Y x 3). 

Find P(Y > 1.5). 

Find P(Y > 1|Y < 3). 


ana C» 
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4.3 


DEFINITION 4.5 


THEOREM 4.4 


Expected Values for Continuous 
Random Variables 


The next step in the study of continuous random variables is to find their means, 
variances, and standard deviations, thereby acquiring numerical descriptive measures 
associated with their distributions. Many times it is difficult to find the probabil- 
ity distribution for a random variable Y or a function of a random variable, g(Y). 
Even if the density function for a random variable is known, it can be difficult 
to evaluate appropriate integrals (we will see this to be the case when a random 
variable has a gamma distribution, Section 4.6). When we encounter these situa- 
tions, the approximate behavior of variables of interest can be established by us- 
ing their moments and the empirical rule or Tchebysheff's theorem (Chapters 1 
and 3). 


The expected value of a continuous random variable Y is 


E(Y) = | yf») dy, 


оо 


provided that the integral exists.? 


If the definition of the expected value for a discrete random variable Y, E(Y) = 
>=, yp(y), is meaningful, then Definition 4.4 also should agree with our intuitive 
notion of a mean. The quantity f (y) dy corresponds to p(y) for the discrete case, and 
integration evolves from and is analogous to summation. Hence, E(Y) in Definition 
4.5 agrees with our notion of an average, or mean. 

As in the discrete case, we are sometimes interested in the expected value of a 
function of a random variable. A result that permits us to evaluate such an expected 
value is given in the following theorem. 


Let g(Y) be a function of Y; then the expected value of g(Y) is given by 


EgO [ 


оо 


gy f O) dy, 


provided that the integral exists. 


The proof of Theorem 4.4 is similar to that of Theorem 3.2 and is omitted. The 
expected values of three important functions of a continuous random variable Y evolve 


3. Technically, E(Y) is said to exist if 


oo 


| ЮО) dy < oo. 


This will be the case in all expectations that we discuss, and we will not mention this additional condition 
each time that we define an expected value. 
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as a consequence of well-known theorems of integration. As expected, these results 
lead to conclusions analogous to those contained in Theorems 3.3, 3.4, and 3.5. As a 
consequence, the proof of Theorem 4.5 will be left as an exercise. 


Let c be a constant and let g(Y), g1(Y), g2(Y),..., gx (Y) be functions of a 
continuous random variable У. Then the following results hold: 


L Ж =e. 
2. E[cg(Y)] = cE[g(Y)]. 
3. E[gi(Y)-9-g2(Y)9-- - --gx(Y)] = E[giCY)]H- E[go (Y)]19-- - -- E[gx CY)]. 


As in the case of discrete random variables, we often seek the expected value 
of the function g(Y) = (Y — u). As before, the expected value of this function 
is the variance of the random variable Y. That is, as in Definition 3.5, V(Y) — 
E(Y — uy. Itisa simple exercise to show that Theorem 4.5 implies that V(Y) — 
E(Y?) – џ2. 


EXAMPLE 4.6 


Solution 


In Example 4.4 we determined that f(y) = (3/8)y? fr0 < y < 2, f(y) = 0 
elsewhere, is a valid density function. If the random variable Y has this density 
function, find и = E(Y) and o? = V(Y). 


According to Definition 4.5, 


Е(Ү) aj yf (y) dy 
= ? 3 2 
=/ ›($)› dy 
zit jT аз 
(8) (3), 75 


The variance of Y can be found once we determine E(Y?). In this case, 


Е(Ү?) = | y! f(y) dy 


2 
3 
2 vA 
= = d 
[°@) ш 


Thus, o? = V (Y) = E(Y2) - [Е(У)р = 24 — (1.5)? = 0.15. ш 
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4.20 


4.21 


4.22 


4.23 
4.24 


4.25 


4.26 


4.27 


4.28 


Exercises 


If, as in Exercise 4.16, Y has density function 
(1/2)2—y), 0<ух2, 


0, elsewhere, 


ДО) = | 
find the mean and variance of Y. 
If, as in Exercise 4.17, Y has density function 


pure О<у<1, 


А elsewhere, 


ХО) = 


find the mean and variance of У. 


If, as in Exercise 4.18, Y has density function 


YA —1<у<0, 
fo)-12-02y, O<yX<1, 
0, elsewhere, 


find the mean and variance of Y. 
Prove Theorem 4.5. 


If Y is a continuous random variable with density function f(y), use Theorem 4.5 to prove 
that o? = V(Y) = E(Y”) – [Е(Ү)Р. 


If, as in Exercise 4.19, Y has distribution function 


0, y <9, 

7 О<у<2, 
F(y)= M 

16' 2<y <4, 

l, yz4 


find the mean and variance of Y. 


If Y is a continuous random variable with mean jz and variance o? and a and b are constants, 
use Theorem 4.5 to prove the following: 


a E(aY+b)=aE(Y)+b=ayut+b. 
b У(аҮ+Ь)=а?У(Ү) = а?о?. 
For certain ore samples, the proportion Y of impurities per sample is a random variable with 


density function given in Exercise 4.21. The dollar value of each sample is W = 5 — .5Y. Find 
the mean and variance of W. 


The proportion of time per day that all checkout counters in a supermarket are busy is a random 
variable Y with density function 


(1 — ^ 0 1, 
fo= [2 y) sys 


А elsewhere. 
a Find the value of c that makes f(y) a probability density function. 
b Find E(Y). 


4.29 


4.30 


4.31 


4.32 


4.33 
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The temperature Y at which a thermostatically controlled switch turns on has probability density 
function given by 
1/2, 59<y<6l, 
ХО) = | 


0, elsewhere. 
Find Е(У) and V(Y). 


The proportion of time Y that an industrial robot is in operation during a 40-hour week is a 
random variable with probability density function 


2 < у < 
= | оок 
0, elsewhere. 
a Find E(Y) and V(Y). 
b For the robot under study, the profit X for a week is given by X = 200Y — 60. Find E(X) 
and V (X). 
c Findan interval in which the profit should lie for at least 7596 of the weeks that the robot 
is in use. 


The pH of water samples from a specific lake is a random variable Y with probability density 
function given by 


3/0 – у), 5<y <7, 
pa »* S<yK 


; elsewhere. 
a Find E(Y) and V(Y). 


b Find an interval shorter than (5, 7) in which at least three-fourths of the pH measurements 
must lie. 

c Would you expect to see a pH measurement below 5.5 very often? Why? 
Weekly CPU time used by an accounting firm has probability density function (measured in 
hours) given by 

8/6434- y), O<y <4, 

fos | 
А elsewhere. 

a Find the expected value and variance of weekly CPU time. 


b The CPU time costs the firm $200 per hour. Find the expected value and variance of the 
weekly cost for CPU time. 


c Would you expect the weekly cost to exceed $600 very often? Why? 


Daily total solar radiation for a specified location in Florida in October has probability density 
function given by 


(3/32)(y -2(6—»), 2<y <6, 
ХО) = | 
0, elsewhere, 


with measurements in hundreds of calories. Find the expected daily solar radiation for October. 


Suppose that Y is a continuous random variable with density f (y) that is positive only if y > 0. 
If F(y) is the distribution function, show that 


Eq) = | yr) у= | [1 — F(y)] dy. 
0 0 


[Hint: If у > 0, y = f; dt, and EY) = fp yf) dy = fr {fp at} f) dy. Exchange the 
order of integration to obtain the desired result. ]* 


4. Exercises preceded by an asterisk are optional. 
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FIGURE 4.9 
Density function 
for Y 


If Y is a continuous random variable such that E[(Y —a)?] < оо for alla, show that E[(Y – а)2] 
is minimized when a = E(Y). [Hint: E[(Y — ay] = E(([Y — E(Y)] + [E(Y) — al).] 


Is the result obtained in Exercise 4.35 also valid for discrete random variables? Why? 


If Y is a continuous random variable with density function f(y) that is symmetric about 0 
(that is, f(y) = f(—y) for all y) and E(Y) exists, show that E(Y) = 0. [Hint: E(Y) = 
f. ue yf (y) dy + Ју yf OQ) dy. Make the change of variable w = — y in the first integral.] 


The Uniform Probability Distribution 


Suppose that a bus always arrives at a particular stop between 8:00 and 8:10 A.M. 
and that the probability that the bus will arrive in any given subinterval of time is 
proportional only to the length of the subinterval. That is, the bus is as likely to arrive 
between 8:00 and 8:02 as it is to arrive between 8:06 and 8:08. Let Y denote the 
length of time a person must wait for the bus if that person arrived at the bus stop at 
exactly 8:00. If we carefully measured in minutes how long after 8:00 the bus arrived 
for several mornings, we could develop a relative frequency histogram for the data. 

From the description just given, it should be clear that the relative frequency with 
which we observed a value of Y between 0 and 2 would be approximately the same 
as the relative frequency with which we observed a value of Y between 6 and 8. A 
reasonable model for the density function of Y is given in Figure 4.9. Because areas 
under curves represent probabilities for continuous random variables and A; = A» 
(by inspection), it follows that P(0 < Y < 2) = P(6 < Y < 8), as desired. 

The random variable Y just discussed is an example of a random variable that has 
auniform distribution. The general form for the density function of a random variable 
with a uniform distribution is as follows. 


If 0; < 62, a random variable Y is said to have a continuous uniform probability 
distribution on the interval (0;, 05) if and only if the density function of Y is 


1 
@ <у< 0, 


7) = 16-6 m 
0, elsewhere. 


f(y) 
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In the bus problem we can take 0; = 0 and 62 = 10 because we are interested only 
in a particular ten-minute interval. The density function discussed in Example 4.2 is 
a uniform distribution with 0; = 0 and 6; = 1. Graphs of the distribution function 
and density function for the random variable in Example 4.2 are given in Figures 4.4 
and 4.5, respectively. 


The constants that determine the specific form of a density function are called 
parameters of the density function. 


The quantities 0; and 0» are parameters of the uniform density function and аге 
clearly meaningful numerical values associated with the theoretical density function. 
Both the range and the probability that Y will fall in any given interval depend on the 
values of 0} and Ө». 

Some continuous random variables in the physical, management, and biological 
sciences have approximately uniform probability distributions. For example, suppose 
that the number of events, such as calls coming into a switchboard, that occur in the 
time interval (0, г) has a Poisson distribution. If itis known that exactly one such event 
has occurred in the interval (0, t), then the actual time of occurrence is distributed 
uniformly over this interval. 


EXAMPLE 4.7 


Solution 


Arrivals of customers at a checkout counter follow a Poisson distribution. It is known 
that, during a given 30-minute period, one customer arrived at the counter. Find 
the probability that the customer arrived during the last 5 minutes of the 30-minute 
period. 


As just mentioned, the actual time of arrival follows a uniform distribution over the 
interval of (0, 30). If Y denotes the arrival time, then 


30 
1 30 — 25 5 1 
Р(25 < Y < 30) = ау = = = -. 
(25 < Y 30) L 30 “~~ 30 30 6 
The probability of the arrival occurring in any other 5-minute interval is also 1/6. 


As we will see, the uniform distribution is very important for theoretical reasons. 
Simulation studies are valuable techniques for validating models in statistics. If we 
desire a set of observations on a random variable Y with distribution function F(y), 
we often can obtain the desired results by transforming a set of observations on a 
uniform random variable. For this reason most computer systems contain a random 
number generator that generates observed values for a random variable that has a 
continuous uniform distribution. 
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Proof 


4.38 


4.39 


4.40 


4.41 


4.42 


4.43 


4.44 


If 6; < 05 and Y is a random variable uniformly distributed on the interval 
(01, 05), then 


0; 4- 6 0, — 01)? 
a bMS Ж о? = V(r) = E, 
By Definition 4.5, 
oo 
Е0) = | xf 
—оо 
| 4 
= y У 
6i 0) — 0 
mE UE 
2046-67 2], 200-0) 
200106 
B E 
Note that the mean of a uniform random variable is simply the value midway 
between the two parameter values, 0; and 02. The derivation of the variance is 
left as an exercise. 
Exercises 
Suppose that Y has a uniform distribution over the interval (0, 1). 
a Find F(y). 
b Show that P(a < Y < a+b), fora > 0, b > 0, anda +b < 1 depends only upon the 
value of b. 


If a parachutist lands at a random point on a line between markers A and B, find the probability 
that she is closer to A than to B. Find the probability that her distance to A is more than three 
times her distance to B. 


Suppose that three parachutists operate independently as described in Exercise 4.39. What is 
the probability that exactly one of the three lands past the midpoint between A and B? 


A random variable Y has a uniform distribution over the interval (01, 62). Derive the variance 
of Y. 


The median of the distribution of a continuous random variable Y is the value ф 5 such that 
P(Y < $5) = 0.5. What is the median of the uniform distribution on the interval (6,, 05)? 


A circle of radius r has area A = mr’. If a random circle has a radius that is uniformly dis- 
tributed on the interval (0, 1), what are the mean and variance of the area of the circle? 


The change in depth of a river from one day to the next, measured (in feet) at a specific location, 
is a random variable Y with the following density function: 


k, —2<у<2 
ХО) = 


0, elsewhere. 
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4.53 


4.54 


Exercises 177 


a Determine the value of k. 
b Obtain the distribution function for У. 


Upon studying low bids for shipping contracts, a microcomputer manufacturing company finds 
that intrastate contracts have low bids that are uniformly distributed between 20 and 25, in units 
of thousands of dollars. Find the probability that the low bid on the next intrastate shipping 
contract 


a isbelow $22,000. 
b isin excess of $24,000. 


Refer to Exercise 4.45. Find the expected value of low bids on contracts of the type described 
there. 


The failure of a circuit board interrupts work that utilizes a computing system until a new board 
is delivered. The delivery time, Y, is uniformly distributed on the interval one to five days. The 
cost of a board failure and interruption includes the fixed cost co of a new board and a cost that 
increases proportionally to Y?. If C is the cost incurred, C = со + ci Y?. 


a Find the probability that the delivery time exceeds two days. 


b In terms of co and cy, find the expected cost associated with a single failed circuit board. 


Beginning at 12:00 midnight, a computer center is up for one hour and then down for two hours 
on a regular cycle. А person who is unaware of this schedule dials the center at a random time 
between 12:00 midnight and 5:00 A.M. What is the probability that the center is up when the 
person's call comes in? 


A telephone call arrived at a switchboard at random within a one-minute interval. The switch 
board was fully busy for 15 seconds into this one-minute period. What is the probability that 
the call arrived when the switchboard was not fully busy? 


If a point is randomly located in an interval (a, b) and if Y denotes the location of the point, 
then Y is assumed to have a uniform distribution over (a, b). A plant efficiency expert randomly 
selects a location along a 500-foot assembly line from which to observe the work habits of the 
workers on the line. What is the probability that the point she selects is 


a within 25 feet of the end of the line? 
b within 25 feet of the beginning of the line? 


с closer to the beginning of the line than to the end of the line? 


The cycle time for trucks hauling concrete to a highway construction site is uniformly distributed 
over the interval 50 to 70 minutes. What is the probability that the cycle time exceeds 65 minutes 
if itis known that the cycle time exceeds 55 minutes? 


Refer to Exercise 4.51. Find the mean and variance of the cycle times for the trucks. 


The number of defective circuit boards coming off a soldering machine follows a Poisson 
distribution. During a specific eight-hour day, one defective circuit board was found. 


a Find the probability that it was produced during the first hour of operation during that day. 
b Findthe probability that it was produced during the last hour of operation during that day. 


Given that no defective circuit boards were produced during the first four hours of operation, 
find the probability that the defective board was manufactured during the fifth hour. 


In using the triangulation method to determine the range of an acoustic source, the test equip- 
ment must accurately measure the time at which the spherical wave front arrives at a receiving 
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4.55 


4.56 


4.57 


4.5 


DEFINITION 4.8 


THEOREM 4.7 


sensor. According to Perruzzi and Hilliard (1984), measurement errors in these times can be 
modeled as possessing a uniform distribution from —0.05 to +0.05 us (microseconds). 


a What is the probability that a particular arrival-time measurement will be accurate to within 
0.01 us? 


b Find the mean and variance of the measurement errors. 


Refer to Exercise 4.54. Suppose that measurement errors are uniformly distributed between 
—0.02 to +0.05 и. 


a Whatisthe probability that a particular arrival-time measurement will be accurate to within 
0.01 us? 


b Find the mean and variance of the measurement errors. 


Refer to Example 4.7. Find the conditional probability that a customer arrives during the last 
5 minutes of the 30-minute period if it is known that no one arrives during the first 10 minutes 
of the period. 


According to Zimmels (1983), the sizes of particles used in sedimentation experiments often 
have a uniform distribution. In sedimentation involving mixtures of particles of various sizes, 
the larger particles hinder the movements of the smaller ones. Thus, it is important to study 
both the mean and the variance of particle sizes. Suppose that spherical particles have diameters 
that are uniformly distributed between .01 and .05 centimeters. Find the mean and variance of 
the volumes of these particles. (Recall that the volume of a sphere is (4/3)zr?.) 


The Normal Probability Distribution 


The most widely used continuous probability distribution is the normal distribution, 
a distribution with the familiar bell shape that was discussed in connection with the 
empirical rule. The examples and exercises in this section illustrate some of the many 
random variables that have distributions that are closely approximated by a normal 
probability distribution. In Chapter 7 we will present an argument that at least partially 
explains the common occurrence of normal distributions of data in nature. The normal 
density function is as follows: 


A random variable Y is said to have a normal probability distribution if and 
only if, foro > 0 and —oo < u < оо, the density function of Y is 


ХО) = ЕНЕ Ск О RUE 67 


OA 27x 


Notice that the normal density function contains two parameters, и and с. 


If Y is a normally distributed random variable with parameters jz and с, then 


Е(Ү) = ш and V(Y)—o?. 


FIGURE 4.10 
The normal 
probability 

density function 
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fo) 


The proof of this theorem will be deferred to Section 4.9, where we derive the 
moment-generating function of a normally distributed random variable. The results 
contained in Theorem 4.7 imply that the parameter и locates the center of the distri- 
bution and that с measures its spread. A graph of a normal density function is shown 
in Figure 4.10. 

Areas under the normal density function corresponding to Р(а < Y < b) require 
evaluation of the integral 


^ 4 2/0292 
/ 220-0100?) ду 
a O 2л 


Unfortunately, а closed-form expression for this integral does not exist; hence, its 
evaluation requires the use of numerical integration techniques. Probabilities and 
quantiles for random variables with normal distributions are easily found using R 
and S-Plus. If Y has a normal distribution with mean u and standard deviation 
c, the А (or S-Plus) command pnorm (yo, 4,0) generates P(Y < yo) whereas 
qnorm (p, ш,о) yields the pth quantile, the value of ф„ such that P(Y < $,) = p. 
Although there are infinitely many normal distributions (jz can take on any finite value, 
whereas o can assume any positive finite value), we need only one table—Table 4, 
Appendix 3—to compute areas under normal densities. Probabilities and quantiles 
associated with normally distributed random variables can also be found using the ap- 
plet Normal Tail Areas and Quantiles accessible at www.thomsonedu.com/statistics/ 
wackerly. The only real benefit associated with using software to obtain probabil- 
ities and quantiles associated with normally distributed random variables is that 
the software provides answers that are correct to a greater number of decimal 
places. 

The normal density function is symmetric around the value u, so areas need be 
tabulated on only one side of the mean. The tabulated areas are to the right of points z, 
where z is the distance from the mean, measured in standard deviations. This area is 
shaded in Figure 4.11. 


EXAMPLE 4.8 


Let Z denote a normal random variable with mean 0 and standard deviation 1. 
a Find P(Z > 2). 
b Find P(—2 < Z x 2). 
c Find P(0 < Z x 1.73). 
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FIGURE 4.11 
Tabulated area 
for the normal 

density function 


Solution 


FIGURE 4.12 
Desired area for 
Example 4.8(b) 


fo) 


ш ш +0 » 
L——ze——]| 


a Since и = О апас = 1, the value 2 is actually z = 2 standard deviations 
above the mean. Proceed down the first (z) column in Table 4, Appendix 3, 
and read the area opposite z = 2.0. This area, denoted by the symbol A(z), is 
A(2.0) = .0228. Thus, P(Z > 2) = .0228. 

b Refer to Figure 4.12, where we have shaded the area of interest. In part (a) 
we determined that A; = A(2.0) = .0228. Because the density function is 
symmetric about the mean jz = 0, it follows that A2 = A; = .0228 and hence 
that 


P(—2 < Z<2)=1-—A;— Аз = 1 — 2(.0228) = .9544. 
с Because P(Z > 0) = A(0) = .5, we obtain that P(0 < Z < 1.73) = 
.5 — A(1.73), where A(1.73) is obtained by proceeding down the z column in 


Table 4, Appendix 3, to the entry 1.7 and then across the top of the table to the 
column labeled .03 to read A(1.73) — .0418. Thus, 


PO < Z < 1.73) = .5 —.0418 = .4582. 


EXAMPLE 4.9 


Solution 


The achievement scores for a college entrance examination are normally distributed 
with mean 75 and standard deviation 10. What fraction of the scores lies between 80 
and 90? 


Recall that z is the distance from the mean of a normal distribution expressed in units 
of standard deviation. Thus, 


FIGURE 4.13 
Required area for 
Example 4.9 
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Then the desired fraction of the population is given by the area between 
80 — 75 90 — 75 
“a= 10 Sm Ec gg 
This area is shaded in Figure 4.13. 


You can see from Figure 4.13 that A = A(.5) — A(1.5) = .3085 — .0668 = .2417. 
Bal 


125. 


4.58 


We сап always transform а normal random variable У to a standard normal random 
variable Z by using the relationship 


с. k 

o 
Table 4, Appendix 3, can then be used to compute probabilities, as shown here. 
Z locates a point measured from the mean of a normal random variable, with the 
distance expressed in units of the standard deviation of the original normal random 
variable. Thus, the mean value of Z must be 0, and its standard deviation must equal 1. 
The proof that the standard normal random variable, Z, is normally distributed with 
mean 0 and standard deviation | is given in Chapter 6. 

The applet Normal Probabilities, accessible at www.thomsonedu.com/statistics/ 
wackerly, illustrates the correspondence between normal probabilities on the original 
and transformed (z) scales. To answer the question posed in Example 4.9, locate the 
interval of interest, (80, 90), on the lower horizontal axis labeled Y. The correspond- 
ing z-scores are given on the upper horizontal axis, and it is clear that the shaded 
area gives Р(80 < Y < 90) = Р(0.5 < Z « 1.5) = 0.2417 (see Figure 4.14). 
A few of the exercises at the end of this section suggest that you use this applet to 
reinforce the calculations of probabilities associated with normally distributed ran- 
dom variables. 


Exercises 


Use Table 4, Appendix 3, to find the following probabilities for a standard normal random 
variable Z: 


a P(0xZx12) 
b P(-9< Z <0) 
c Р(3 < 2 < 1.56) 
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FIGURE 4.14 
Required area for 
Example 4.9, using 
both the original and 
transformed (z) scales 
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P(80.0000 « Y « 90.0000) = P(0.50 < Z « 1.50) = 0.2417 


0.40 4 
0.30 4 
Prob = 0.2417 
0.20 4 
0.10 4 
0.00 1 
-4.00 0.50 1.50 4.00 
Z 
80.00 90.00 
Y 


P(—.2 < Z < 2) 
P(—1.56 < Z < —.2) 

f Applet Exercise Use the applet Normal Probabilities to obtain P(0 < Z < 1.2). Why 
are the values given on the two horizontal axes identical? 


If Z is a standard normal random variable, find the value zo such that 


a P(Z >z) = .5. 
b P(Z < zo) = .8643. 
с P(—zo < Z < zo) = .90. 
d P(—zo< Z < zo) = .99. 
A normally distributed random variable has density function 
ХО) = = е79—0?/00?) —00 < y < оо. 
ON 2N ; 


Using the fundamental properties associated with any density function, argue that the parameter 
c must be such that o > 0. 


What is the median of a normally distributed random variable with mean jz and standard 
deviation с? 


If Z is a standard normal random variable, what is 


a P(Z? <1)? 
b P(Z < 3.84146)? 


A company that manufactures and bottles apple juice uses a machine that automatically fills 
16-ounce bottles. There is some variation, however, in the amounts of liquid dispensed into the 
bottles that are filled. The amount dispensed has been observed to be approximately normally 
distributed with mean 16 ounces and standard deviation 1 ounce. 
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a Use Table 4, Appendix 3, to determine the proportion of bottles that will have more than 
17 ounces dispensed into them. 


b Applet Exercise Use the applet Normal Probabilities to obtain the answer to part (a). 


The weekly amount of money spent on maintenance and repairs by a company was observed, 
over a long period of time, to be approximately normally distributed with mean $400 and 
standard deviation $20. If $450 is budgeted for next week, what is the probability that the 
actual costs will exceed the budgeted amount? 


a Answer the question, using Table 4, Appendix 3. 
b Applet Exercise Use the applet Normal Probabilities to obtain the answer. 
с Why аге the labeled values different on the two horizontal axes? 


In Exercise 4.64, how much should be budgeted for weekly repairs and maintenance to provide 
that the probability the budgeted amount will be exceeded in a given week is only .1? 


A machining operation produces bearings with diameters that are normally distributed with 
mean 3.0005 inches and standard deviation .0010 inch. Specifications require the bearing diam- 
eters to lie in the interval 3.000 + .0020 inches. Those outside the interval are considered scrap 
and must be remachined. With the existing machine setting, what fraction of total production 
will be scrap? 


a Answer the question, using Table 4, Appendix 3. 
b Applet Exercise Obtain the answer, using the applet Normal Probabilities. 


In Exercise 4.66, what should the mean diameter be in order that the fraction of bearings 
scrapped be minimized? 


The grade point averages (GPAs) of a large population of college students are approximately 
normally distributed with mean 2.4 and standard deviation .8. What fraction of the students 
will possess a ОРА in excess of 3.0? 


a Answer the question, using Table 4, Appendix 3. 
b Applet Exercise Obtain the answer, using the applet Normal Tail Areas and Quantiles. 


Refer to Exercise 4.68. If students possessing a GPA less than 1.9 are dropped from college, 
what percentage of the students will be dropped? 


Refer to Exercise 4.68. Suppose that three students are randomly selected from the student 
body. What is the probability that all three will possess a GPA in excess of 3.0? 


Wires manufactured for use in a computer system are specified to have resistances between 
.12 and .14 ohms. The actual measured resistances of the wires produced by company A have 
a normal probability distribution with mean .13 ohm and standard deviation .005 ohm. 


a What is the probability that a randomly selected wire from company A's production will 
meet the specifications? 

b If four of these wires are used in each computer system and all are selected from com- 
pany A, what is the probability that all four in a randomly selected system will meet the 
specifications? 


One method of arriving at economic forecasts is to use a consensus approach. A forecast is 
obtained from each of a large number of analysts; the average of these individual forecasts is 
the consensus forecast. Suppose that the individual 1996 January prime interest-rate forecasts 
of all economic analysts are approximately normally distributed with mean 796 and standard 
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4.80 


deviation 2.6%. If a single analyst is randomly selected from among this group, what is the 
probability that the analyst's forecast of the prime interest rate will 


a exceed 11%? 
b beless than 996? 


The width of bolts of fabric is normally distributed with mean 950 mm (millimeters) and 
standard deviation 10 mm. 


a Whatisthe probability that a randomly chosen bolt has a width of between 947 and 958 mm? 


b What is the appropriate value for C such that a randomly chosen bolt has a width less than 
C with probability .8531? 


Scores on an examination are assumed to be normally distributed with mean 78 and variance 36. 


a Whatis the probability that a person taking the examination scores higher than 72? 

b Suppose that students scoring in the top 10% of this distribution are to receive an A grade. 
What is the minimum score a student must achieve to earn an A grade? 

c What must be the cutoff point for passing the examination if the examiner wants only the 
top 28.1% of all scores to be passing? 


d Approximately what proportion of students have scores 5 or more points above the score 
that cuts off the lowest 2596? 


e Applet Exercise Answer parts (a)-(d), using the applet Normal Tail Areas and Quantiles. 


f Ifitis known that a student's score exceeds 72, what is the probability that his or her score 
exceeds 84? 


A soft-drink machine can be regulated so that it discharges an average of и ounces per cup. If 
the ounces of fill are normally distributed with standard deviation 0.3 ounce, give the setting 
for u so that 8-ounce cups will overflow only 1% of the time. 


The machine described in Exercise 4.75 has standard deviation o that can be fixed at certain 
levels by carefully adjusting the machine. What is the largest value of o that will allow the 
actual amount dispensed to fall within 1 ounce of the mean with probability at least .95? 


The SAT and ACT college entrance exams are taken by thousands of students each year. The 
mathematics portions of each of these exams produce scores that are approximately normally 
distributed. In recent years, SAT mathematics exam scores have averaged 480 with standard 
deviation 100. The average and standard deviation for ACT mathematics scores are 18 and 6, 
respectively. 


a Anengineering school sets 550 as the minimum SAT math score for new students. What 
percentage of students will score below 550 in a typical year? 

b What score should the engineering school set as a comparable standard on the ACT 
math test? 


Show that the maximum value of the normal density with parameters и and o is 1/(o у 27) 
and occurs when y = и. 


Show that the normal density with parameters u and с has inflection points at the values и — с 
and u + с. (Recall that an inflection point is a point where the curve changes direction from 
concave up to concave down, or vice versa, and occurs when the second derivative changes 
sign. Such a change in sign may occur when the second derivative equals zero.) 


Assume that Y is normally distributed with mean jz and standard deviation o. After observing 
a value of Y, a mathematician constructs a rectangle with length L = |У | and width W = 3|Y |. 
Let A denote the area of the resulting rectangle. What is E(A)? 
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FIGURE 4.15 f(y) 


A skewed probability 
density function 
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The Gamma Probability Distribution 


Some random variables are always nonnegative and for various reasons yield dis- 
tributions of data that are skewed (nonsymmetric) to the right. That is, most of the 
area under the density function is located near the origin, and the density function 
drops gradually as y increases. A skewed probability density function is shown in 
Figure 4.15. 

The lengths of time between malfunctions for aircraft engines possess a skewed 
frequency distribution, as do the lengths of time between arrivals at a supermarket 
checkout queue (that is, the line at the checkout counter). Similarly, the lengths of 
time to complete a maintenance checkup for an automobile or aircraft engine possess 
a skewed frequency distribution. The populations associated with these random vari- 
ables frequently possess density functions that are adequately modeled by a gamma 
density function. 


A random variable Y is said to have a gamma distribution with parameters 
a > O and В > 0 if and only if the density function of Y is 


ус 1672/8 
ТОЕ 


(0), elsewhere, 


(0 = у «$9; 


where 


oo 
I'(o) = uic dy. 
0 


The quantity Г (о) is known as the gamma function. Direct integration will verify 
that Г(1) = 1. Integration by parts will verify that Г (œ) = (o — 1)Г(е — 1) for any 
a > l and that T(n) = (n — 1)!, provided that n is an integer. 

Graphs of gamma density functions for о = 1, 2, and 4 and В = | are given in 
Figure 4.16. Notice in Figure 4.16 that the shape of the gamma density differs for 
the different values of с. For this reason, о is sometimes called the shape parameter 
associated with a gamma distribution. The parameter £ is generally called the scale 
parameter because multiplying a gamma-distributed random variable by a positive 
constant (and thereby changing the scale on which the measurement is made) produces 
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FIGURE 4.16 f(y) 


Gamma density 
functions, @ = 1 


THEOREM 4.8 


arandom variable that also has a gamma distribution with the same value of o (shape 
parameter) but with an altered value of 6. 

In the special case when o is an integer, the distribution function of a gamma- 
distributed random variable can be expressed as a sum of certain Poisson probabilities. 
You will find this representation in Exercise 4.99. If о is not an integer and 0 < с < 
d < оо, it is impossible to give a closed-form expression for 


d y% 1 e P 


с Ве) ^ 
As a result, except when о = 1 (an exponential distribution), it is impossible 
to obtain areas under the gamma density function by direct integration. Tabulated 
values for integrals like the above are given in Tables of the Incomplete Gamma 
Function (Pearson 1965). By far the easiest way to compute probabilities associ- 
ated with gamma-distributed random variables is to use available statistical soft- 
ware. If Y is a gamma-distributed random variable with parameters a and £, the 
К (or S-Plus) command pgamma (yo,«,1/8) generates P(Y < yo), whereas 
qgamma (q,a,1/B) yields the pth quantile, the value of $, such that P(Y < $,) = 
p. In addition, one of the applets, Gamma Probabilities and Quantites, accessible at 
www.thomsonedu.com/statistics/wackerly, can be used to determine probabilities and 
quantiles associated with gamma-distributed random variables. Another applet at the 
Thomson website, Comparison of Gamma Density Functions, will permit you to vi- 
sualize and compare gamma density functions with different values for о and/or В. 
These applets will be used to answer some of the exercises at the end of this section. 
As indicated in the next theorem, the mean and variance of gamma-distributed 
random variables are easy to compute. 


If Y has a gamma distribution with parameters o and В, then 


и —E(Y)2oB and о? = V(Y) = af?. 
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Proof 
оо оо паа 
Е(Ү) = / yf (y) dy = | y (а) ау. 
—оо 0 В°Г(а) 
By definition, the gamma density function is such that 
oo ,a—15—y/B 
ЕСЕР UR 
ОВО 
Hence, 
oo 
] уте? rr. 
0 
and 
© 1% —-y/B q 1 оо 
EQ) =| ye Y | Jem 
o ВеГ(а) ВеГ(а) Jo 
1 ВаГ(о) 
= [8°1г(0 + 1)] = = af. 
B*T (о) Г(о) 
From Exercise 4.24, V (Y) = E[Y?] — [Е(У)]2. Further, 
оо a—1,—y/B 1 оо 
Е(Ү?) = | у? С ы ) an = ool у+!1е7>/8 dy 
0 B°T@) B*T'(a) Jo 
1 B? (a. + Dar (a) 2 
= үй ?T + 2)) 2 ——————~ = a@ + 1)6?. 
B*T'(a) Г(о) 

Then V (Y) = Е[Ү?]—[Е(У)]? where, from the earlier part of the derivation, 
E(Y) = af. Substituting E[Y 2] and E(Y) into the formula for V (Y), we obtain 
V(Y) = a(@ + 1)B° — (98)? = o? 8^ + af? — o? B? = aff. 

Two special cases of gamma-distributed random variables merit particular consid- 
eration. 
DEFINITION 4.10 Let v be a positive integer. A random variable Y is said to have a chi-square 


distribution with v degrees of freedom if and only if Y is a gamma-distributed 
random variable with parameters о = v/2 and f = 2. 


A random variable with a chi-square distribution is called a chi-square (x?) random 
variable. Such random variables occur often in statistical theory. The motivation 
behind calling the parameter v the degrees of freedom of the x? distribution rests on 
one of the major ways for generating a random variable with this distribution and is 
given in Theorem 6.4. The mean and variance of a x? random variable follow directly 
from Theorem 4.8. 
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THEOREM 4.9 


Proof 


DEFINITION 4.11 


THEOREM 4.10 


Proof 


If Y is a chi-square random variable with v degrees of freedom, then 
и= Е(Ү) =» and о? = V(Y)2 2v. 


Apply Theorem 4.8 with о = 0/2 and В = 2. 


Tables that give probabilities associated with x? distributions are readily available 
in most statistics texts. Table 6, Appendix 3, gives percentage points associated with 
x? distributions for many choices of v. Tables of the general gamma distribution 
are not so readily available. However, we will show in Exercise 6.46 that if Y has a 
gamma distribution with  — n/2 for some integer n, then 2Y / В has a x? distribution 
with n degrees of freedom. Hence, for example, if Y has a gamma distribution with 
a = 1.5 = 3/2 and В = 4, then 2//8 = 2Y/4 = Y/2 has a x? distribution with 
3 degrees of freedom. Thus, P(Y « 3.5) = P([Y/2] « 1.75) can be found by using 
readily available tables of the x? distribution. 

The gamma density function in which a = | is called the exponential density 
function. 


A random variable Y 1s said to have an exponential distribution with parameter 
В > O if and only if the density function of Y is 


jr qu 


0, elsewhere. 


The exponential density function is often useful for modeling the length of life 
of electronic components. Suppose that the length of time a component already has 
operated does not affect its chance of operating for at least b additional time units. 
That is, the probability that the component will operate for more than a + b time units, 
given that it has already operated for at least a time units, is the same as the probability 
that a new component will operate for at least b time units if the new component is put 
into service at time О. A fuse is an example of a component for which this assumption 
often is reasonable. We will see in the next example that the exponential distribution 
provides a model for the distribution of the lifetime of such a component. 


If Y is an exponential random variable with parameter В, then 
и = Е(Ү) = В and о? = У(У) = f?. 


The proof follows directly from Theorem 4.8 with o = 1. 


EXAMPLE 4.10 


Suppose that Y has an exponential probability density function. Show that, if a — 0 
and b 0, 


P(Y >a+b|Y >а) = P(Y > Р). 


Solution 
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From the definition of conditional probability, we have that 


P(Y >a+t+b) 


Р(Ү HY > a) = 
pou mdi irs) 


because the intersection of the events (Y > a + b) and (Y > a) is the event (Y > 
а + Б). Now 


1 =у/Ё -] —(а+Ь)/В 
=e dy = —e ’ le н 
a+b 


оо 


py >а+®= | 


a+b 


Similarly, 
TU 3 2 
PY >а) = в“ У/В dy = eP, 


and 
en (a+b)/B 


P(Y >a+b|Y > a) = ——— 
e—4/B 


=e 0/0 — P(Y >b). 


This property of the exponential distribution is often called the memoryless property 
of the distribution. 
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You will recall from Chapter 3 that the geometric distribution, a discrete distri- 
bution, also had this memoryless property. An interesting relationship between the 
exponential and geometric distributions is given in Exercise 4.95. 


Exercises 


a Ifa > 0, Г(о) is defined by Г(а) = /^ y"-!e^ dy, show that Г(1) = 1. 
* Ifa > 1, integrate by parts to prove that r (œ) = (a — 1)Г(о — 1). 


Use the results obtained in Exercise 4.81 to prove that if п is a positive integer, then TF (n) = 
(n — 1)!. What are the numerical values of F (2), Г(4), and Г (7)? 


Applet Exercise Use the applet Comparison of Gamma Density Functions to obtain the results 
given in Figure 4.16. 


Applet Exercise Refer to Exercise 4.83. Use the applet Comparison of Gamma Density Func- 
tions to compare gamma density functions with (о = 4, 8 = 1), (a = 40, В = 1), and 
(a = 80, B = 1). 


a What do you observe about the shapes of these three density functions? Which are less 
skewed and more symmetric? 


b What differences do you observe about the location of the centers of these density functions? 


Give an explanation for what you observed in part (b). 
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Applet Exercise Use the applet Comparison of Gamma Density Functions to compare gamma 
density functions with (о = 1, В = 1), (о = 1, В = 2), and (о = 1, В = 4). 


a What is another name for the density functions that you observed? 
Do these densities have the same general shape? 


The parameter £ is a “scale” parameter. What do you observe about the “spread” of these 
three density functions? 


Applet Exercise When we discussed the x? distribution in this section, we presented (with 
justification to follow in Chapter 6) the fact that if Y is gamma distributed with a = n/2 for 
some integer n, then 2Y /@ has a x? distribution. In particular, it was stated that when œ = 1.5 
and 8 = 4, W = Y/2 has a x? distribution with 3 degrees of freedom. 


a Usethe applet Gamma Probabilities and Quantiles to find P(Y « 3.5). 


b Use the applet Gamma Probabilities and Quantiles to find P(W < 1.75). [Hint: Recall that 
the x? distribution with v degrees of freedom is just a gamma distribution with а = v/2 
and 8 = 2.] 


c Compare your answers to parts (a) and (b). 
Applet Exercise Let Y and W have the distributions given in Exercise 4.86. 


а Usetheapplet Gamma Probabilities and Quantiles to find the .05-quantile of the distribution 
of Y. 

b Use the applet Gamma Probabilities and Quantiles to find the .05-quantile of the x? 
distribution with 3 degrees of freedom. 

c What is the relationship between the .05-quantile of the gamma (о = 1.5, 6 = 4) distri- 
bution and the .05-quantile of the x? distribution with 3 degrees of freedom? Explain this 
relationship. 


The magnitude of earthquakes recorded in a region of North America can be modeled as 
having an exponential distribution with mean 2.4, as measured on the Richter scale. Find the 
probability that an earthquake striking this region will 


a exceed 3.0 on the Richter scale. 
b fall between 2.0 and 3.0 on the Richter scale. 


The operator of a pumping station has observed that demand for water during early after- 
noon hours has an approximately exponential distribution with mean 100 cfs (cubic feet per 
second). 


a Find the probability that the demand will exceed 200 cfs during the early afternoon on a 
randomly selected day. 


b What water-pumping capacity should the station maintain during early afternoons so 
that the probability that demand will exceed capacity on a randomly selected day is 
only .01? 


Refer to Exercise 4.88. Of the next ten earthquakes to strike this region, what is the probability 
that at least one will exceed 5.0 on the Richter scale? 


If Y has an exponential distribution and P(Y > 2) = .0821, what is 
a B=E(Y)? 
b P(Y <1.7)? 


The length of time Y necessary to complete a key operation in the construction of houses has 
an exponential distribution with mean 10 hours. The formula С = 100 + 40Y + ЗУ? relates 
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the cost C of completing this operation to the square of the time to completion. Find the mean 
and variance of C. 


Historical evidence indicates that times between fatal accidents on scheduled American do- 
mestic passenger flights have an approximately exponential distribution. Assume that the mean 
time between accidents is 44 days. 


a If one of the accidents occurred on July 1 of a randomly selected year in the study period, 
what is the probability that another accident occurred that same month? 
b What is the variance of the times between accidents? 


One-hour carbon monoxide concentrations in air samples from a large city have an approxi- 
mately exponential distribution with mean 3.6 ppm (parts per million). 


a Find the probability that the carbon monoxide concentration exceeds 9 ppm during a 
randomly selected one-hour period. 

b A traffic-control strategy reduced the mean to 2.5 ppm. Now find the probability that the 
concentration exceeds 9 ppm. 


Let Y be an exponentially distributed random variable with mean 8. Define a random variable 
X in the following way: X = kifk—1zxY < k for k =1,2,.... 


a Find Р(Х =k) foreachk = 1, 2,.... 
b Show that your answer to part (a) can be written as 


P(X =k) = (eA (1-e*), к=, 2... 


and that X has a geometric distribution with p = (1 — e~'/4). 


Suppose that a random variable Y has a probability density function given by 


kye”, у> 0, 
ХО) = 


0, elsewhere. 


Find the value of k that makes f (у) a density function. 
Does Y have a x? distribution? If so, how many degrees of freedom? 


What are the mean and standard deviation of Y? 


co C» 


Applet Exercise What is the probability that Y lies within 2 standard deviations of its 
mean? 


A manufacturing plant uses a specific bulk product. The amount of product used in one day 
can be modeled by an exponential distribution with 8 = 4 (measurements in tons). Find the 
probability that the plant will use more than 4 tons on a given day. 


Consider the plant of Exercise 4.97. How much of the bulk product should be stocked so that 
the plant's chance of running out of the product is only .05? 


If A > 0 and g is a positive integer, the relationship between incomplete gamma integrals and 
sums of Poisson probabilities is given by 


a—l Men 


1? 
a— d — 
ml yeu» 


x=0 
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a IfY hasagamma distribution with о = 2 and В = 1, find P(Y > 1) by using the preceding 
equality and Table 3 of Appendix 3. 

b Applet Exercise If Y has a gamma distribution with о = 2 and £ = 1, find P(Y > 1) by 
using the applet Gamma Probabilities. 


Let Y be a gamma-distributed random variable where o is a positive integer and 8 = 1. The 
result given in Exercise 4.99 implies that that if y > 0, 


a-l jx,5—y 


y lb. 


! 
x=0 a: 


Suppose that X, is Poisson distributed with mean A, and X» is Poisson distributed with mean 

Л, Where A» > А. 

a Show that P(X, = 0) > P(X, = 0). 

b Let k be any fixed positive integer. Show that P(X; < К) = P(Y > А) and P(X; < К) = 
P(Y > Аз), where Y is has a gamma distribution with o = k + 1 and f = 1. 


с Letkbeany fixed positive integer. Use the result derived in part (b) and the fact that A, > A, 
to show that P(X; < К) > Р(Х» < К). 


d Because the result in part (c) is valid for апу k = 1, 2, 3,... and part (a) is also valid, we 
have established that P(X, < k) > P(X, < К) forallk = 0, 1, 2, . . . . Interpret this result. 


Applet Exercise Refer to Exercise 4.88. Suppose that the magnitude of earthquakes striking 
the region has a gamma distribution with а = .8 and В = 2.4. 


a What is the mean magnitude of earthquakes striking the region? 

b What is the probability that the magnitude an earthquake striking the region will exceed 
3.0 on the Richter scale? 

c Compare your answers to Exercise 4.88(a). Which probability is larger? Explain. 


d Whatisthe probability that an earthquake striking the regions will fall between 2.0 and 3.0 
on the Richter scale? 


Applet Exercise Refer to Exercise 4.97. Suppose that the amount of product used in one day 
has a gamma distribution with о = 1.5 and В = 3. 


a Find the probability that the plant will use more than 4 tons on a given day. 
b How much of the bulk product should be stocked so that the plant's chance of running out 
of the product is only .05? 


Explosive devices used in mining operations produce nearly circular craters when detonated. 
The radii of these craters are exponentially distributed with mean 10 feet. Find the mean and 
variance of the areas produced by these explosive devices. 


The lifetime (in hours) Y of an electronic component is a random variable with density function 
given by 


0, elsewhere. 


Three of these components operate independently in a piece of equipment. The equipment fails 
if at least two of the components fail. Find the probability that the equipment will operate for 
at least 200 hours without failure. 


Four-week summer rainfall totals in a section of the Midwest United States have approximately 
а gamma distribution with о = 1.6 and В = 2.0. 


4.106 


4.107 


4.108 


4.109 
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a Find the mean and variance of the four-week rainfall totals. 


b Applet Exercise What is the probability that the four-week rainfall total exceeds 
4 inches? 


The response times on an online computer terminal have approximately a gamma distribution 
with mean four seconds and variance eight seconds’. 


a Write the probability density function for the response times. 


b Applet Exercise What is the probability that the response time on the terminal is less than 
five seconds? 


Refer to Exercise 4.106. 


a Use Tchebysheff's theorem to give an interval that contains at least 75% of the response 
times. 


b Applet Exercise Whatis the actual probability of observing a response time in the interval 
you obtained in part (a)? 


Annual incomes for heads of household in a section of a city have approximately a gamma 
distribution with о = 20 and В = 1000. 


a Findthe mean and variance of these incomes. 
b Would you expect to find many incomes in excess of $30,000 in this section of the city? 


c Applet Exercise What proportion of heads of households in this section of the city have 
incomes in excess of $30,000? 


The weekly amount of downtime Y (in hours) for an industrial machine has approximately a 
gamma distribution with о = 3 and В = 2. The loss L (in dollars) to the industrial operation 
as a result of this downtime is given by L = 30Y + 2Y?. Find the expected value and variance 
of L. 


If Y has a probability density function given by 


4ye, у> 0, 
ҒО) = 


; elsewhere, 
obtain E(Y) and V (Y) by inspection. 
Suppose that Y has a gamma distribution with parameters о and В. 


a Ifa is any positive or negative value such that a + a > 0, show that 


B^T(a +a) 


eU P 


b Why did your answer in part (a) require that a + a > 0? 

Show that, with a = 1, the result in part (a) gives E(Y) = af. 

d Use the result in part (a) to give an expression for E(./Y). What do you need to assume 
about a? 


e Use the result in part (a) to give an expression for E(1/Y), E(1/VY), and Е(1/Ү?). What 
do you need to assume about o in each case? 


[s] 


Suppose that Y has a x? distribution with v degrees of freedom. Use the results in Exercise 
4.111 in your answers to the following. These results will be useful when we study the t and 
F distributions in Chapter 7. 
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a Give an expression for E(Y^) if v > —2a. 
Why did your answer in part (a) require that v > —2a? 
Use the result in part (a) to give an expression for E (МУ). What do you need to assume 
about v? 

d Use the result in part (a) to give an expression for E(1/ Y), E(1/4/Y), and E(1/Y?). What 
do you need to assume about v in each case? 


The Beta Probability Distribution 


The beta density function is a two-parameter density function defined over the closed 
interval 0 < у < 1. Itis often used as a model for proportions, such as the proportion 
of impurities in a chemical product or the proportion of time that a machine is under 
repair. 


A random variable Y is said to have a beta probability distribution with param- 
eters a > О and B > 0 if and only if the density function of Y is 


a—1 = В—1 
б (ll = y) 
з re 0 == == il, 
уу =} Во В) Se 
0, elsewhere, 
where 
же 2 Г(о)Г(В) 
= «—1 1 = В-1 dy = REPAS. 
B(a, B) || y^ ^l — y y Г +B) 


The graphs of beta density functions assume widely differing shapes for various 
values of the two parameters o and 6. Some of these are shown in Figure 4.17. Some 
of the exercises at the end of this section ask you to use the applet Comparison of Beta 
Density Functions accessible at www.thomsonedu.com/statistics/wackerly to explore 
and compare the shapes of more beta densities. 

Notice that defining y over the interval 0 < у < 1 does not restrict the use of 
the beta distribution. If c < y < d, then y* = (y — c)/(d — c) defines a new 
variable such that 0 < y* < 1. Thus, the beta density function can be applied to a 
random variable defined on the interval c < y < d by translation and a change of 
scale. 

The cumulative distribution function for the beta random variable is commonly 
called the incomplete beta function and is denoted by 


ros |, C D at Ls f) 
У) BeB 5 ch 


A tabulation of 7, (о, P) is given in Tables of the Incomplete Beta Function (Pearson, 
1968). When o and P are both positive integers, /,(o', В) is related to the binomial 


FIGURE 4.17 у) 


Beta density 
functions 
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probability function. Integration by parts can be used to show that for 0 < y < 1, 
and o and f both integers, 


= d pb) = _ п i n—i 
Fo- | — а= у (i) a-y, 


where n = a + В — 1. Notice that the sum on the right-hand side of this expres- 
sion is just the sum of probabilities associated with a binomial random variable 
with n = a+ B — 1 and р = y. The binomial cumulative distribution function 
is presented in Table 1, Appendix 3, for n = 5, 10, 15, 20, and 25 and р = .01, 
.05, .10, .20, .30, .40, .50, .60, .70, .80, .90, .95, and .99. The most efficient way 
to obtain binomial probabilities is to use statistical software such as R or S-Plus 
(see Chapter 3). An even easier way to find probabilities and quantiles associated 
with beta-distributed random variables is to use appropriate software directly. The 
Thomson website provides an applet, Beta Probabilities, that gives “upper-tail” prob- 
abilities [that is, P(Y > yo)] and quantiles associated with beta-distributed ran- 
dom variables. In addition, if Y is a beta-distributed random variable with param- 
eters о and f, the А (or S-Plus) command pbeta (yo, o, 1/8) generates P(Y < 
yo), whereas qbeta(p,«,1/8) yields the pth quantile, the value of 
Pp such that P(Y < $,) = p. 


If Y is a beta-distributed random variable with parameters œ > 0 and В > 0, 
then 


Ip = IEEE A | 
a+ В о) аео 
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Proof 


By definition, 


Ba) = | 350019) 
үе 
o L Bep i 


" il [ eyi = у=" 
Dv D: 


2 ES (because о > 0 implies that a + 1 > 0) 
_ Ге+8) Го + ГОВ) 

r&r) Го +6 +1) 

Г(о + В) «Г(о)Г (8) g 


~ Г@)Г(8) (а + ВГВ) (a B) 


[ 


The derivation of the variance is left to the reader (see Exercise 4.130). 


We will see in the next example that the beta density function can be integrated 
directly in the case when o and f are both integers. 


EXAMPLE 4.11 


Solution 


A gasoline wholesale distributor has bulk storage tanks that hold fixed supplies and 
are filled every Monday. Of interest to the wholesaler is the proportion of this supply 
that is sold during the week. Over many weeks of observation, the distributor found 
that this proportion could be modeled by a beta distribution with a = 4 and В = 2. 
Find the probability that the wholesaler will sell at least 9096 of her stock in a given 
week. 


If Y denotes the proportion sold during the week, then 

r(4+2 
yd —у), Sys 
fi =} Г@ФГО,) 


0, elsewhere, 
and 


oo 1 
Р(Ү > .9) = f fO) ау = ] 20(y? — y^) dy 
.9 .9 


y4 1 5 1 
= 20 d = 5 — 20(.004) — .08. 
4 9 .9 


It is not very likely that 90% of the stock will be sold in а given week. 
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Exercises 


Applet Exercise Use the applet Comparison of Beta Density Functions to obtain the results 
given in Figure 4.17. 


Applet Exercise Refer to Exercise 4.113. Use the applet Comparison of Beta Density 
Functions to compare beta density functions with (a = 1, В = 1), (a = 1, В = 2), and 


(y —2, B=1). 


What have we previously called the beta distribution with (о = 1, 6 = 1)? 
Which of these beta densities is symmetric? 

Which of these beta densities is skewed right? 

Which of these beta densities is skewed left? 


*e In Chapter 6 we will see that if Y is beta distributed with parameters o and 8, then 
Ү* = 1 — Y has a beta distribution with parameters w* = В and В“ = a. Does this explain 
the differences in the graphs of the beta densities? 


aa lcm» 


Applet Exercise Use the applet Comparison of Beta Density Functions to compare beta den- 
sity functions with (y = 2, В = 2), (a = 3, B = 3), and (о = 9, B = 9). 


a What are the means associated with random variables with each of these beta distributions? 
b What is similar about these densities? 


How do these densities differ? In particular, what do you observe about the "spread" of 
these three density functions? 

d Calculate the standard deviations associated with random variables with each of these beta 
densities. Do the values of these standard deviations explain what you observed in part (c)? 
Explain. 

e Graph some more beta densities with о = В. What do you conjecture about the shape of 
beta densities with a = f? 


Applet Exercise Use the applet Comparison of Beta Density Functions to compare beta den- 
sity functions with (y = 1.5, В = 7), (а = 2.5, B = 7), and (a = 3.5, B = 7). 


a Are these densities symmetric? Skewed left? Skewed right? 
b What do you observe as the value of o gets closer to 7? 


с Graph some more beta densities with о > 1, В > 1, апаа < В. What do you conjecture 
about the shape of beta densities when both œ > 1, В > 1, anda < В? 


Applet Exercise Use the applet Comparison of Beta Density Functions to compare beta den- 
sity functions with (о = 9, В = 7), (о = 10, В = 7), and (a = 12, B = 7). 


a Аге these densities symmetric? Skewed left? Skewed right? 
What do you observe as the value of o gets closer to 12? 


с Graph some more beta densities with a > 1, В > 1, апаа > В. What do you conjecture 
about the shape of beta densities with a > В and both a > 1 and В > 1? 


Applet Exercise Use the applet Comparison of Beta Density Functions to compare beta den- 
sity functions with (о = .3, В = 4), (a = .3, B = 7), and (о = .3, В = 12). 


a Are these densities symmetric? Skewed left? Skewed right? 
b What do you observe as the value of В gets closer to 12? 
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c Which of these beta distributions gives the highest probability of observing a value larger 
than 0.2? 


d Graph some more beta densities with a < 1 and @ > 1. What do you conjecture about the 
shape of beta densities with a < 1 and В > 1? 


Applet Exercise Use the applet Comparison of Beta Density Functions to compare beta den- 
sity functions with (a = 4, В = 0.3), (а = 7, В = 0.3), and (е = 12, В = 0.3). 

a Are these densities symmetric? Skewed left? Skewed right? 

b What do you observe as the value of o gets closer to 12? 


Which of these beta distributions gives the highest probability of observing a value less 
than 0.8? 


d Graph some more beta densities with a > 1 and В < 1. What do you conjecture about the 
shape of beta densities with о > 1 and В < 1? 


In Chapter 6 we will see that if Y is beta distributed with parameters о and £, then Y* = 1— Y 
has a beta distribution with parameters a* = В and f* = о. Does this explain the differences 
and similarities in the graphs of the beta densities in Exercises 4.118 and 4.119? 


Applet Exercise Usethe applet Comparison of Beta Density Functions to compare beta density 
functions with (y = 0.5, В = 0.7), (a = 0.7, B = 0.7), and (a = 0.9, B = 0.7). 
a Whatis the general shape of these densities? 


b What do you observe as the value of o gets larger? 


Applet Exercise Beta densities with a < 1 and В < 1 are difficult to display because of 
scaling/resolution problems. 


a Use the applet Beta Probabilities and Quantiles to compute P(Y > 0.1) if Y has a beta 
distribution with (о = 0.1, В = 2). 

b Use the applet Beta Probabilities and Quantiles to compute P(Y < 0.1) if Y has a beta 
distribution with (y = 0.1, В = 2). 

с Based on your answer to part (b), which values of Y are assigned high probabilities if Y 
has a beta distribution with (y — 0.1, 8 — 2)? 

d Use the applet Beta Probabilities and Quantiles to compute P(Y < 0.1) if Y has a beta 
distribution with (о = 0.1, В = 0.2). 

e Use the applet Beta Probabilities and Quantiles to compute P(Y > 0.9) if Y has a beta 
distribution with (о = 0.1, В = 0.2). 

f Use the applet Beta Probabilities and Quantiles to compute P(0.1 < Y < 0.9) if Y hasa 
beta distribution with (о = .1, 8 = 0.2). 

g Based on your answers to parts (d), (e), and (f), which values of Y are assigned high 
probabilities if Y has a beta distribution with (о = 0.1, В = 0.2)? 


The relative humidity Y, when measured at a location, has a probability density function 
given by 


ку?(1- у), O<y<1, 
ҒО) = 


, elsewhere. 


a Find the value of k that makes f (y) a density function. 


b Applet Exercise Use the applet Beta Probabilities and Quantiles to find a humidity value 
that is exceeded only 596 of the time. 
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The percentage of impurities per batch in a chemical product is a random variable Y with 
density function 


fo) = | 12y(1—- у), 0О<у<1, 


| elsewhere. 
A batch with more than 4046 impurities cannot be sold. 
a Integrate the density directly to determine the probability that a randomly selected batch 
cannot be sold because of excessive impurities. 
b Applet Exercise Use the applet Beta Probabilities and Quantiles to find the answer to 
part (a). 
Refer to Exercise 4.124. Find the mean and variance of the percentage of impurities in a 


randomly selected batch of the chemical. 


The weekly repair cost Y for a machine has a probability density function given by 


310—»9?, 0 1, 
у=} » TE 


я elsewhere, 
with measurements in hundreds of dollars. How much money should be budgeted each week 


for repair costs so that the actual cost will exceed the budgeted amount only 10% of the time? 


Verify that if Y has a beta distribution with a = В = 1, then Y has a uniform distribution 
over (0, 1). That is, the uniform distribution over the interval (0, 1) is a special case of a beta 
distribution. 


Suppose that a random variable Y has a probability density function given by 
6y1—y), Oxyzl 
ЈО) = | 

A elsewhere. 
a Find F(y). 
b Graph F(y) and f(y). 
c Find P(.5 < Y < .8). 
During an eight-hour shift, the proportion of time Y that a sheet-metal stamping machine is 
down for maintenance or repairs has a beta distribution with a = 1 and В = 2. That is, 


fo) MG О<у<1, 

43 0, elsewhere. 
The cost (in hundreds of dollars) of this downtime, due to lost production and cost of mainte- 
nance and repair, is given by С = 10 + 20У + 4Ү?. Find the mean and variance of С. 


Prove that the variance of a beta-distributed random variable with parameters o and В is 


а? = HS 
(a+ B)*(a + B +1) 
Errors in measuring the time of arrival of a wave front from an acoustic source sometimes have 


an approximate beta distribution. Suppose that these errors, measured in microseconds, have 
approximately a beta distribution with о = 1 and f = 2. 


a What is the probability that the measurement error in a randomly selected instance is less 
than .5 us? 


b Give the mean and standard deviation of the measurement errors. 
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Proper blending of fine and coarse powders prior to copper sintering is essential for uniformity 
in the finished product. One way to check the homogeneity of the blend is to select many small 
samples of the blended powders and measure the proportion of the total weight contributed by 
the fine powders in each. These measurements should be relatively stable if a homogeneous 
blend has been obtained. 


a Suppose that the proportion of total weight contributed by the fine powders has a beta 
distribution with о = В = 3. Find the mean and variance of the proportion of weight 
contributed by the fine powders. 


b Repeat part (a) if = В = 2. 
с Repeat part (a) ifo, = В = 1. 
d Which of the cases—parts (a), (b), or (c)—yields the most homogeneous blending? 


The proportion of time per day that all checkout counters in a supermarket are busy is a random 
variable Y with a density function given by 


ji It —у)%, б<у<1, 


} elsewhere. 


a Find the value of c that makes f(y) a probability density function. 


b Find E(Y). (Use what you have learned about the beta-type distribution. Compare your 
answers to those obtained in Exercise 4.28.) 


с Calculate the standard deviation of У. 
d Applet Exercise Use the applet Beta Probabilities and Quantiles to find P(Y > u +20). 
In the text of this section, we noted the relationship between the distribution function of a 


beta-distributed random variable and sums of binomial probabilities. Specifically, if Y has a 
beta distribution with positive integer values for œ and В and 0 < y < 1, 


= 3 PU 7 p^ 2 x n i n—i 
Fo) = | озш a=) ()уа—» 


where n = o + p — 1. 

a ІУ has a beta distribution with о = 4 and В = 7, use the appropriate binomial tables to 
find P(Y < .7) = F(.7). 

b If Y has a beta distribution with a = 12 and В = 14, use the appropriate binomial tables 
to find P(Y < .6) = F(.6). 

c Applet Exercise Use the applet Beta Probabilities and Quantiles to find the probabilities 
in parts (a) and (b). 


Suppose that Y, and Y? are binomial random variables with parameters (n, pı) and (n, p2), 
respectively, where p, « p». (Note that the parameter п is the same for the two variables.) 
a Use the binomial formula to deduce that P(Y, = 0) > P(Y, = 0). 


b Use the relationship between the beta distribution function and sums of binomial proba- 
bilities given in Exercise 4.134 to deduce that, if k is an integer between | and n — 1, 


k | | 1 tk qo n—k-1 
PY, <= (eva p = жыт СБ“ 
pi ш 


i=0 
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c Ifk is an integer between | and n — 1, the same argument used in part (b) yields that 


k 1 k п—К—1 
п ; : t^(1— t) 
P(Y < k) = "d= рх)" = ——_——— dt 
(Y; < k) 2 MICI p») р В(К+1л—Ё 
Show that, if k is any integer between 1 and n — 1, P(Y, < k) > P(Y» < k). Interpret this 
result. 


Some General Comments 


Keep in mind that density functions are theoretical models for populations of real 
data that occur in random phenomena. How do we know which model to use? How 
much does it matter if we use the wrong density as our model for reality? 

To answer the latter question first, we are unlikely ever to select a density function 
that provides a perfect representation of nature; but goodness of fit is not the criterion 
for assessing the adequacy of our model. The purpose of a probabilistic model is to 
provide the mechanism for making inferences about a population based on informa- 
tion contained in a sample. The probability of the observed sample (or a quantity 
proportional to it) is instrumental in making an inference about the population. It 
follows that a density function that provides a poor fit to the population frequency 
distribution could (but does not necessarily) yield incorrect probability statements and 
lead to erroneous inferences about the population. А good model is one that yields 
good inferences about the population of interest. 

Selecting a reasonable model is sometimes a matter of acting on theoretical consid- 
erations. Often, for example, a situation in which the discrete Poisson random variable 
is appropriate is indicated by the random behavior of events in time. Knowing this, 
we can show that the length of time between any adjacent pair of events follows an 
exponential distribution. Similarly, if a and b are integers, a < b, then the length of 
time between the occurrences of the ath and bth events possesses a gamma distri- 
bution with a = b — a. We will later encounter a theorem (called the central limit 
theorem) that outlines some conditions that imply that a normal distribution would 
be a suitable approximation for the distribution of data. 

A second way to select a model is to form a frequency histogram (Chapter 1) 
for data drawn from the population and to choose a density function that would vi- 
sually appear to give a similar frequency curve. For example, if a set of n — 100 
sample measurements yielded a bell-shaped frequency distribution, we might con- 
clude that the normal density function would adequately model the population fre- 
quency distribution. 

Not all model selection is completely subjective. Statistical procedures are avail- 
able to test a hypothesis that a population frequency distribution is of a particular 
type. We can also calculate a measure of goodness of fit for several distributions 
and select the best. Studies of many common inferential methods have been made 
to determine the magnitude of the errors of inference introduced by incorrect pop- 
ulation models. It is comforting to know that many statistical methods of inference 
are insensitive to assumptions about the form of the underlying population frequency 
distribution. 
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The uniform, normal, gamma, and beta distributions offer an assortment of den- 
sity functions that fit many population frequency distributions. Another, the Weibull 
distribution, appears in the exercises at the end of the chapter. 


Other Expected Values 


Moments for continuous random variables have definitions analogous to those given 
for the discrete case. 


If Y is a continuous random variable, then the kth moment about the origin is 
given by 


p, = EY”, ke 25s 
The kth moment about the mean, or the kth central moment, is given by 


m = BO =W E= Basss 


Notice that for k = 1, иу = и, and fork = 2, ро = V (Y) = 0°. 


EXAMPLE 4.12 


Solution 


DEFINITION 4.14 


Find р, for the uniform random variable with 0; = 0 and 0 = 0. 


By definition, 


oo Ө 1 k+1 Ө ok 
ш = £o = | roy dy= | уЁ (5) dy = — | = . 
ee eo ө 6k-D], k+l 


Thus, 


and so on. E 


If Y is a continuous random variable, then the moment-generating function of 
Y is given by 


m(t) = E (e). 


The moment-generating function is said to exist if there exists a constant b > 0 
such that m (7) is finite for |t| < b. 


This is simply the continuous analogue of Definition 3.14. That m(t) generates 
moments is established in exactly the same manner as in Section 3.9. If m(t) exists, 
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ба оо ty? ry 
E(e”) = [| e" f (y) dy zii (1 Tey Е JJ f(y) dy 


оо оо оо 
= | оо | stones ч LO) at 
—00 
j p 
= ibd Qu Ты ы apt 
Notice that the moment-generating function, 


12 


m) =1+ш\ + SH „+, 


takes the same form for both discrete and continuous random variables. Hence, 
Theorem 3.12 holds for continuous random variables, and 


EXAMPLE 4.13 Find the moment-generating function for a gamma-distributed random variable. 
& a—1,—y 
Solution m(t) = E(e') Е Г oy Е е | dy 
0 8°Г (œ) 
wh rm Gl 
=a | у ep|-»iz- y 
B°T(@) Jo B 


= "C i yet op || dy. 
°T (a) Jo B/C — Bt) 


[The term exp(-) is simply a more convenient way to write e? when the term in the 
exponent is long or complex.] 

To complete the integration, notice that the integral of the variable factor of any 
density function must equal the reciprocal of the constant factor. That is, if f (y) = 
cg (y), where c is a constant, then 


oo oo oo 1 
I / ъ= [ cg(y)dy=1 andso f 80) dy = -— 


Applying this result to the integral in m(t) and noting that if [B/(1 — Bt)] > 0 (or, 
equivalently, if t < 1/8), 


g(y) = у! x ехр{—у/[8/(1 — 80]) 


is the variable factor of a gamma density function with parameters œ > 0 and [8/ 
(1 — Bt)] > 0, we obtain 


m(t) = : ( Ё ) ге) = m eE for t « E BH 
peri LX pr (1 — Вг) B 
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The moments u, can be extracted from the moment-generating function by dif- 
ferentiating with respect to ¢ (in accordance with Theorem 3.12) or by expanding the 
function into a power series in t. We will demonstrate the latter approach. 


EXAMPLE 4.14 


Solution 


Expand the moment-generating function of Example 4.13 into a power series in ¢ and 
thereby obtain и. 


From Example 4.13, m(t) = 1/(1 — Bt)* = (1 — Bt) *. Using the expansion for a 
binomial term of the form (x + y) ©, we have 


m(t) = (1 — Bt) = 14 (-a)(1) *- (pr) 
_ A —о—2(__ 2 
y = у De 
Piala + 18] | Pla(a+ D) (0 +2] 


= 1+ г(о8) + Я + т Te 


Because ju), is the coefficient of t*/k!, we find, by inspection, 
Ш = w= ов, 
u, = «(е +1)8°, 
из = о(о + 1) (о + 2)8°, 


and, in general, w, = о(о + 1) (0 + 2)---(a +k — 1) В“. Notice that ш" and u, 
agree with the results of Theorem 4.8. Moreover, these results agree with the result 
of Exercise 4.111(a). П 


We have already explained the importance of the expected values of Y*, 
(У — и)“, апа е'?, all of which provide important information about the distribu- 
tion of Y. Sometimes, however, we are interested in the expected value of a function 
of a random variable as an end in itself. (We also may be interested in the probability 
distribution of functions of random variables, but we defer discussion of this topic 
until Chapter 6.) 


EXAMPLE 4.15 


The kinetic energy k associated with a mass m moving at velocity v is given by the 
expression 


Consider a device that fires a serrated nail into concrete at a mean velocity of 2000 
feet per second, where the random velocity V possesses a density function given by 
y3 e 500 
v) = —— — , о> 0. 
fw) (500)^T (4) z 


Find the expected kinetic energy associated with a nail of mass m. 


Solution 
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Let K denote the random kinetic energy associated with the nail. Then 
mV? m 
E(K) = Е | —— | = —Е(У?), 
(К) ( 2 ) 2 (V^) 


by Theorem 4.5, part 2. The random variable V has a gamma distribution with 
a = 4 and @ = 500. Therefore, E(V?) = ио for the random variable V. Referring 
to Example 4.14, we have u} = a (œ + 1)8? = 4(5)(500)? = 5,000,000. Therefore, 


Е(К) = SEW?) = 7, (5,000,000) = 2,500,000 m. o 


THEOREM 4.12 


Finding the moments of a function of a random variable is frequently facilitated 
by using its moment-generating function. 


Let Y be a random variable with density function f (y) and g(Y) be a function 
of Y. Then the moment-generating function for g(Y) is 


Eje] = J КОО 


со 


This theorem follows directly from Definition 4.14 and Theorem 4.4. 


EXAMPLE 4.16 


Solution 


Let g(Y) = Y — и, where Y is a normally distributed random variable with mean ш 
and variance o°. Find the moment-generating function for g (Y). 


The moment-generating function of g(Y) is given by 
2 5,2 
m eo cm — и) [2e ] dj 
OA 25x 


m(t) = Efe’ ] = g[e' 1 = [| 


—DO 
To integrate, let u = y — и. Then du = dy and 


1 99 2 
m(t) = Í ele #120") du 
OA 27 —00 


1 © 1 2 2 
= exp|— { 553 (u^ — 20и) | du. 
os 23 J—oo 20 


Complete the square in the exponent of e by multiplying and dividing by e' *o*/2 Then 
жа pen exp[—(1/207)(u? — 2o?tu + a^t?)] "m 
—oo0 OA 2x 
m pon exp[—(u — o?t)? /2a?] du 
= OA 2x 


The function inside the integral is a normal density function with mean o?t and 
variance o°. (See the equation for the normal density function in Section 4.5.) Hence, 
the integral is equal to 1, and 


m(t) = e», 
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The moments of U = Y — и can be obtained from m(t) by differentiating m(t) in 
accordance with Theorem 3.12 or by expanding m(t) into a series. 


4.136 


4.137 


4.138 


4.139 


4.140 


The purpose of the preceding discussion of moments is twofold. First, moments 
can be used as numerical descriptive measures to describe the data that we obtain in 
an experiment. Second, they can be used in a theoretical sense to prove that a random 
variable possesses a particular probability distribution. It can be shown that if two 
random variables Y and Z possess identical moment-generating functions, then Y 
and Z possess identical probability distributions. This latter application of moments 
was mentioned in the discussion of moment-generating functions for discrete random 
variables in Section 3.9; it applies to continuous random variables as well. 

For your convenience, the probability and density functions, means, variances, 
and moment-generating functions for some common random variables are given in 
Appendix 2 and inside the back cover of this text. 


Exercises 


Suppose that the waiting time for the first customer to enter a retail shop after 9:00 A.M. is a 
random variable Y with an exponential density function given by 


1 
= |2720 у> 0, 
о) = (s) f 


0, elsewhere. 


a Find the moment-generating function for Y. 
b Use the answer from part (a) to find E (Y) and V (Y). 


Show that the result given in Exercise 3.158 also holds for continuous random variables. That 
is, show that, if Y is a random variable with moment-generating function m(t) and U is given 
by U = aY + b, the moment-generating function of U is e'^m(at). If Y has mean u and 
variance c?, use the moment-generating function of U to derive the mean and variance of U. 


Example 4.16 derives the moment-generating function for Y — u, where Y is normally dis- 
tributed with mean у and variance o?. 


a Use the results in Example 4.16 and Exercise 4.137 to find the moment-generating function 
for Y. 


b Differentiate the moment-generating function found in part (a) to show that E(Y) = и and 
V(Y)-o?. 


The moment-generating function of a normally distributed random variable, Y, with mean 
и and variance o? was shown in Exercise 4.138 to be m(t) = e^ 1/2"5^. Use the result 
in Exercise 4.137 to derive the moment-generating function of X — —3Y 4- 4. What is the 
distribution of X? Why? 


Identify the distributions of the random variables with the following moment-generating 
functions: 

a m(t) —(1— 4t). 

b m(t)—1/(1 — 3.22). 


2 
c mt) = eit, 
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4.142 


4.143 


4.144 


4.145 
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THEOREM 4.13 
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If Ө, < 65, derive the moment-generating function of a random variable that has a uniform 
distribution on the interval (6), 05). 


Refer to Exercises 4.141 and 4.137. Suppose that Y is uniformly distributed on the interval 
(0, 1) and that a > O is a constant. 


Give the moment-generating function for У. 
Derive the moment-generating function of W = aY. What is the distribution of W? Why? 
Derive the moment-generating function of X = —aY. What is the distribution of X? Why? 


aaa Bf 


If b is a fixed constant, derive the moment-generating function of V = aY + b. What is 
the distribution of V? Why? 


The moment-generating function for the gamma random variable is derived in Example 4.13. 
Differentiate this moment-generating function to find the mean and variance of the gamma 
distribution. 


Consider a random variable Y with density function given by 


/(у) = кет, оо < у < ою. 


а Findk. 
b Find the moment-generating function of У. 
с Find E(Y) and V(Y). 


A random variable Y has the density function 


y 


е, у<0, 
ХО) = k 


, elsewhere. 


a Find E(e*”’?). 
b Find the moment-generating function for У. 
c Find V(Y). 


Tchebysheff's Theorem 


As was the case for discrete random variables, an interpretation of џи and о for 
continuous random variables is provided by the empirical rule and Tchebysheff's 
theorem. Even if the exact distributions are unknown for random variables of interest, 
knowledge of the associated means and standard deviations permits us to deduce 
meaningful bounds for the probabilities of events that are often of interest. 

We stated and utilized Tchebysheff's theorem in Section 3.11. We now restate this 
theorem and give a proof applicable to a continuous random variable. 


Tchebysheff’s Theorem Let Y be a random variable with finite mean u and 
variance o?. Then, for any k > 0, 


1 1 
SA = о} ug вт р = Gen a 
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Proof We will give the proof for a continuous random variable. The proof for the 
discrete case proceeds similarly. Let f(y) denote the density function of Y. 
Then 


Vie f EE 


u—ko " u+ko 
= | (у = Ш? f(y) dy + | (у = uy fO) dy 


u-ko 


n | (= WFO) dy. 
u+ko 


The second integral is always greater than or equal to zero, and (y — 4)? > k?0? 
for all values of y between the limits of integration for the first and third integrals; 
that is, the regions of integration are in the tails of the density function and cover 
only values of y for which (y — u)? > K?o?. Replace the second integral by 
zero and substitute k?o? for (y — u)? in the first and third integrals to obtain 
the inequality 


ш=Ко оо 
Verg > | Ка? f (y) dy «f Ко? f (y) dy. 
—oo u+ko 

Then 

p—ko +оо 
о? > Ro? | f Od) ro» J 
—0o u+ko 
or 


о? > Ro [P(Y < и — ko) + P(Y > u+ ко)] = ko? P(Y — ш > ko). 


Dividing by &?o?, we obtain 
1 
Е 
ог, equivalently, 


1 
паа 


One real value of Tchebysheff’s theorem is that it enables us to find bounds for 
probabilities that ordinarily would have to be obtained by tedious mathematical ma- 
nipulations (integration or summation). Further, we often can obtain means and vari- 
ances of random variables (see Example 4.15) without specifying the distribution of 
the variable. In situations like these, Tchebysheff’s theorem still provides meaningful 
bounds for probabilities of interest. 


EXAMPLE 4.17 Suppose that experience has shown that the length of time Y (in minutes) required 
to conduct a periodic maintenance check on a dictating machine follows a gamma 
distribution witha = 3.1 and В = 2. A new maintenance worker takes 22.5 minutes to 


Solution 
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check the machine. Does this length of time to perform a maintenance check disagree 
with prior experience? 


The mean and variance for the length of maintenance check times (based on prior 
experience) are (from Theorem 4.8) 


и =aB = (3.1)(2) = 6.2 and о? = af? = (3.0?) = 12.4. 


It follows that o = /12.4 = 3.52. Notice that у = 22.5 minutes exceeds the mean 
u = 6.2 minutes by 16.3 minutes, or k = 16.3/3.52 = 4.63 standard deviations. 
Then from Tchebysheff’s theorem, 


1 
P(|Y — 6.2| > 16.3) = P(|Y — u| > 4.630) € ——— = .0466. 
(| | > 16.3) = РҮ — u| > 4.630) < (4.632 


This probability is based on the assumption that the distribution of maintenance 
times has not changed from prior experience. Then, observing that P(Y > 22.5) is 
small, we must conclude either that our new maintenance worker has generated by 
chance a lengthy maintenance time that occurs with low probability or that the new 
worker is somewhat slower than preceding ones. Considering the low probability for 
P(Y > 22.5), we favor the latter view. 


4.146 


4.147 


4.148 


The exact probability, P(Y > 22.5), for Example 4.17 would require evaluation 


of the integral 


e y21p-y/2 


‚5 2T 3.1) 


Although we could utilize tables given by Pearson (1965) to evaluate this integral, we 
cannot evaluate it directly. We could, of course use R or S-Plus or one of the provided 
applets to numerically evaluate this probability. Unless we use statistical software, 
similar integrals are difficult to evaluate for the beta density and for many other den- 
sity functions. Tchebysheff's theorem often provides quick bounds for probabilities 
while circumventing laborious integration, utilization of software, or searches for 
appropriate tables. 


Р(Ү > 22.5) = [ ау. 
2 


Exercises 


A manufacturer of tires wants to advertise a mileage interval that excludes no more than 10% 
of the mileage on tires he sells. All he knows is that, for a large number of tires tested, the mean 
mileage was 25,000 miles, and the standard deviation was 4000 miles. What interval would 
you suggest? 


A machine used to fill cereal boxes dispenses, on the average, и ounces рег box. The man- 
ufacturer wants the actual ounces dispensed Y to be within 1 ounce of у at least 75% of the 
time. What is the largest value of c, the standard deviation of Y, that can be tolerated if the 
manufacturer's objectives are to be met? 


Find P(|Y — ш < 2o) for Exercise 4.16. Compare with the corresponding probabilistic 
statements given by Tchebysheff's theorem and the empirical rule. 
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4.150 


4.151 
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Find P(|Y — u| < 2o) for the uniform random variable. Compare with the corresponding 
probabilistic statements given by Tchebysheff's theorem and the empirical rule. 


Find P(|Y — u| < 20) for the exponential random variable. Compare with the corresponding 
probabilistic statements given by Tchebysheff's theorem and the empirical rule. 


Refer to Exercise 4.92. Would you expect C to exceed 2000 very often? 


Refer to Exercise 4.109. Find an interval that will contain L for at least 89% of the weeks that 
the machine is in use. 


Refer to Exercise 4.129. Find an interval for which the probability that C will lie within it is at 
least .75. 


Suppose that Y is a x? distributed random variable with v — 7 degrees of freedom. 


a What are the mean and variance of Y? 
b Isitlikely that Y will take on a value of 23 or more? 
c Applet Exercise Use the applet Gamma Probabilities and Quantiles to find P(Y > 23). 


Expectations of Discontinuous 
Functions and Mixed Probability 
Distributions (Optional) 


Problems in probability and statistics sometimes involve functions that are partly 
continuous and partly discrete, in one of two ways. First, we may be interested in the 
properties, perhaps the expectation, of a random variable g (Y) that is a discontinuous 
function of a discrete or continuous random variable Y . Second, the random variable 
of interest itself may have a distribution function that is continuous over some intervals 
and such that some isolated points have positive probabilities. 

We illustrate these ideas with the following examples. 


EXAMPLE 4.18 


Solution 


A retailer for a petroleum product sells a random amount Y each day. Suppose that 
Y, measured in thousands of gallons, has the probability density function 


G/8y?, 0x y x2, 


0, elsewhere. 


=} 


The retailer’s profit turns out to be $100 for each 1000 gallons sold (10¢ per gallon) 
if Y < 1 and $40 extra per 1000 gallons (an extra 4¢ per gallon) if Y > 1. Find the 
retailer’s expected profit for any given day. 


Let g(Y) denote the retailer’s daily profit. Then 


100Y, O<Y<l, 


Ү) = 
iie ps 1<Y <2. 
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We want to find expected profit; by Theorem 4.4, the expectation is 


ele = f 20) f 0) dy 


1 3 А 2 3 Е 
[10 = |у a+ f 140у | {= |у | dy 
0 8 1 8 

300 |+ 420 il 
(8)(4” Jy 5 


300 420 
зэ Dt 37 0» 06.25 


1 


Thus, the retailer can expect a profit of $206.25 on the daily sale of this particular 
product. L| 


Suppose that Y denotes the amount paid out per policy in one year by an insurance 
company that provides automobile insurance. For many policies, Y = 0 because the 
insured individuals are not involved in accidents. For insured individuals who do have 
accidents, the amount paid by the company might be modeled with one of the density 
functions that we have previously studied. A random variable Y that has some of 
its probability at discrete points (0 in this example) and the remainder spread over 
intervals is said to have a mixed distribution. Let Е (у) denote a distribution function 
of a random variable Y that has a mixed distribution. For all practical purposes, any 
mixed distribution function F (y) can be written uniquely as 


F(y) = a Fi (y) + e (у), 


where Ё (у) is a step distribution function, Р (у) is a continuous distribution function, 
cı is the accumulated probability of all discrete points, and c? = 1 — c, is the accu- 
mulated probability of all continuous portions. 

The following example gives an illustration of a mixed distribution. 


EXAMPLE 4.19 


Solution 


Let Y denote the length of life (in hundreds of hours) of electronic components. 
These components frequently fail immediately upon insertion into a system. It has 
been observed that the probability of immediate failure is 1/4. If a component does 
not fail immediately, the distribution for its length of life has the exponential density 
function 


y 


S cr 
ro={ dd 


0, elsewhere. 


Find the distribution function for Y and evaluate P(Y > 10). 


There is only one discrete point, y — 0, and this point has probability 1/4. Hence, 
c, — 1/4 and c; — 3/4. It follows that Y is a mixture of the distributions of two 
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FIGURE 4.18 
Distribution function 
F (у) for 

Example 4.19 


F(y) 


1/4 


random variables, X, and X», where X, has probability | at point 0 and X; has the 
given exponential density. That is, 


" Е 0, у<0, 
[EM у>0, 
апа 
0, у <0, 
Бу) = |р е^ах= 1-е, у> 0. 
Now 
F(y) = 0/4) Fi (у) + G/4) FQ). 
and, hence, 
P(Y > 10) = 1— P(Y < 10) = 1 — Е(10) 
= 1—[(1/4) + 3/4) — e?) 
= (3/4)[1 —  —e7")] = G/4)e 9. 
A graph of F (у) is given in Figure 4.18. ш 


DEFINITION 4.15 


An easy method for finding expectations of random variables with mixed distri- 
butions is given in Definition 4.15. 


Let Y have the mixed distribution function 


F(y) = cf) + с (у) 


and suppose that Х| is a discrete random variable with distribution function 
Fi(y) and that X> is a continuous random variable with distribution function 
F5(y). Let g(Y) denote a function of Y. Then 


E[g(Y)] = ciE[g(X1)] + со Elg (X2)]. 
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EXAMPLE 4.20 


Solution 


Find the mean and variance of the random variable defined in Example 4.19. 
With all definitions as in Example 4.19, it follows that 

E(X;)=0 and E(X2)= Г ye? ау = 1. 
Therefore, 


u = Е(Ү) = 0/4) E(X1) + G/4) E(X5) = 3/4. 
Also, 


oo 
E(X?)=0 and EXD = | yle^ dy = 2. 
0 


Therefore, 
EY?) = (1/4) E(X1) + G/A E(X5) = 1/40) + 6/40) = 3/2. 
Then 
V(Y) = E(Y?)) – w? = (3/2) — (3/4 = 15/16. B 


*4.155 


*4.156 
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Exercises 


A builder of houses needs to order some supplies that have a waiting time Y for delivery, 
with a continuous uniform distribution over the interval from 1 to 4 days. Because she can get 
by without them for 2 days, the cost of the delay is fixed at $100 for any waiting time up to 
2 days. After 2 days, however, the cost of the delay is $100 plus $20 per day (prorated) for 
each additional day. That is, if the waiting time is 3.5 days, the cost of the delay is $100 4- 
$20(1.5) — $130. Find the expected value of the builder's cost due to waiting for supplies. 


The duration Y of long-distance telephone calls (in minutes) monitored by a station is a random 
variable with the properties that 


P(Y = 3) = .2 and P(Y =6)=.1. 
Otherwise, Y has a continuous density function given by 


pal O к= 


р elsewhere. 
The discrete points at 3 and 6 are due to the fact that the length of the call is announced to the 


caller in three-minute intervals and the caller must pay for three minutes even if he talks less 
than three minutes. Find the expected duration of a randomly selected long-distance call. 


The life length Y of a component used in a complex electronic system is known to have an 
exponential density with a mean of 100 hours. The component is replaced at failure or at age 
200 hours, whichever comes first. 


a Find the distribution function for X, the length of time the component is in use. 
b Find E(X). 
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4.12 


Consider the nail-firing device of Example 4.15. When the device works, the nail is fired with 
velocity, V, with density 


y3e-"/500 


~ COTO 

The device misfires 2% of the time it is used, resulting in a velocity of 0. Find the expected 
kinetic energy associated with a nail of mass т. Recall that the kinetic energy, К, of a mass m 
moving at velocity v is k = (mv?)/2. 


fv) 


A random variable Y has distribution function 


0, if y < 0, 

y? + 0.1, if0 < y < 0.5, 
F(y) = : 

y, if0.5<y<1, 

1, if y 7 I. 


a Give Ё (у) and Р (у), the discrete and continuous components of F (y). 
b Write F(y) as cy Ё (у) + c? Р (у). 
c Find the expected value and variance of Y. 


Summary 


This chapter presented probabilistic models for continuous random variables. The 
density function, which provides a model for a population frequency distribution as- 
sociated with a continuous random variable, subsequently will yield a mechanism 
for inferring characteristics of the population based on measurements contained in a 
sample taken from that population. As a consequence, the density function provides 
a model for a real distribution of data that exist or could be generated by repeated ex- 
perimentation. Similar distributions for small sets of data (samples from populations) 
were discussed in Chapter 1. 

Four specific types of density functions—uniform, normal, gamma (with the x? and 
exponential as special cases), and beta—were presented, providing a wide assortment 
of models for population frequency distributions. For your convenience, Table 4.1 
contains a summary of the R (or S-Plus) commands that provide probabilities and 
quantiles associated with these distributions. Many other density functions could be 
employed to fit real situations, but the four described suit many situations adequately. 
A few other density functions are presented in the exercises at the end of the chapter. 

The adequacy of a density function in modeling the frequency distribution for a ran- 
dom variable depends upon the inference-making technique to be employed. If modest 


Table 4.1 К (and S-Plus) procedures giving probabilities and percentiles for some common con- 
tinuous distributions 


pth Quantile: 


Distribution P(Y < yo) $p Such That P(Y < ¢,) = p 
Normal рпогт (ур, ш,о) qnorm (р, и,о) 
Exponential pexp (yo, 1/f) аехр (p,1/B) 

Gamma pgamma (yo, o, 1/f) qgamma (p,o,1/f) 


Beta pbeta (уо, а, B) qbeta (p,a , B) 


4160 
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disagreement between the model and the real population frequency distribution does 
not affect the goodness of the inferential procedure, the model is adequate. 

The latter part of the chapter concerned expectations, particularly moments and 
moment-generating functions. It is important to focus attention on the reason for 
presenting these quantities and to avoid excessive concentration on the mathematical 
aspects of the material. Moments, particularly the mean and variance, are numerical 
descriptive measures for random variables. Particularly, we will subsequently see that 
itis sometimes difficult to find the probability distribution for a random variable Y ora 
function g(Y), and we already have observed that integration over intervals for many 
density functions (the normal and gamma, for example) is very difficult. When this 
occurs, we can approximately describe the behavior of the random variable by using 
its moments along with Tchebysheff's theorem and the empirical rule (Chapter 1). 
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Supplementary Exercises 


Let the density function of a random variable Y be given by 
2 
—— —1<у<1, 
Јо) =улЧа+у) 
0, elsewhere. 
a Find the distribution function. 
b Find E(Y). 
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The length of time required to complete a college achievement test is found to be normally 
distributed with mean 70 minutes and standard deviation 12 minutes. When should the test be 
terminated if we wish to allow sufficient time for 90% of the students to complete the test? 


A manufacturing plant utilizes 3000 electric light bulbs whose length of life is normally dis- 
tributed with mean 500 hours and standard deviation 50 hours. To minimize the number of 
bulbs that burn out during operating hours, all the bulbs are replaced after a given period of 
operation. How often should the bulbs be replaced if we want not more than 1% of the bulbs 
to burn out between replacement periods? 


Refer to Exercise 4.66. Suppose that five bearings are randomly drawn from production. What 
is the probability that at least one is defective? 


The length of life of oil-drilling bits depends upon the types of rock and soil that the drill 
encounters, but it is estimated that the mean length of life is 75 hours. An oil exploration 
company purchases drill bits whose length of life is approximately normally distributed with 
mean 75 hours and standard deviation 12 hours. What proportion of the company's drill bits 


a will fail before 60 hours of use? 
b will last at least 60 hours? 


с will have to be replaced after more than 90 hours of use? 
Let Y have density function 


vo» 
cye ?, O0<y<o, 
ХО) = | 


0, elsewhere. 


a Find the value of c that makes f (y) a density function. 
b Give the mean and variance for Y. 


с Give the moment-generating function for Y. 


Use the fact that 


2 3 4 


Р 2 z 
e Ueber 
to expand the moment-generating function of Example 4.16 into a series to find 41, H2, Из, 
and u4 for the normal random variable. 


Find an expression for jj, = E(Y к), where the random variable Y has a beta distribution. 


The number of arrivals № at a supermarket checkout counter in the time interval from 0 to г 
follows a Poisson distribution with mean At. Let T denote the length of time until the first 
arrival. Find the density function for Т. [Note: P(T > tọ) = P(N = Qatt = ).] 


An argument similar to that of Exercise 4.168 can be used to show that if events are occurring 
in time according to a Poisson distribution with mean Ат, then the interarrival times between 
events have an exponential distribution with mean 1/4. If calls come into a police emergency 
center at the rate of ten per hour, what is the probability that more than 15 minutes will elapse 
between the next two calls? 


Refer to Exercise 4.168. 


a IfU is the time until the second arrival, show that U has a gamma density function with 
o = 2 апа p = 1/4. 
b Show that the time until the kth arrival has a gamma density with œ = k and В = 1/A. 
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Suppose that customers arrive at a checkout counter at a rate of two per minute. 


a Whatarethe mean and variance of the waiting times between successive customer arrivals? 

b Ifa clerk takes three minutes to serve the first customer arriving at the counter, what is the 
probability that at least one more customer will be waiting when the service to the first 
customer is completed? 


Calls for dial-in connections to a computer center arrive at an average rate of four per minute. 
The calls follow a Poisson distribution. If a call arrives at the beginning of a one-minute interval, 
what is the probability that a second call will not arrive in the next 20 seconds? 


Suppose that plants of a particular species are randomly dispersed over an area so that the 
number of plants in a given area follows a Poisson distribution with a mean density of A plants 
per unit area. If a plant is randomly selected in this area, find the probability density function 
of the distance to the nearest neighboring plant. [Hint: If R denotes the distance to the nearest 
neighbor, then P(R > r) is the same as the probability of seeing no plants in a circle of 
radius r.] 


The time (in hours) a manager takes to interview a job applicant has an exponential distribution 
with B — 1/2. The applicants are scheduled at quarter-hour intervals, beginning at 8:00 A.M., 
and the applicants arrive exactly on time. When the applicant with an 8:15 A.M. appointment 
arrives at the manager's office, what is the probability that he will have to wait before seeing 
the manager? 


The median value y of a continuous random variable is that value such that F(y) — .5. Find 
the median value of the random variable in Exercise 4.11. 


If Y has an exponential distribution with mean £, find (as a function of £) the median of У. 


Applet Exercise Use the applet Gamma Probabilities and Quantiles to find the medians of 
gamma distributed random variables with parameters 


a = 1, B = 3. Compare your answer with that in Exercise 4.176. 
a = 2, В = 2. Is the median larger or smaller than E(Y)? 
a = 5, В = 10. Is the median larger or smaller than E(Y)? 


aa c» 


In all of these cases, the median exceeds the mean. How is that reflected in the shapes of 
the corresponding densities? 


Graph the beta probability density function for œ = 3 and В = 2. 
a IfY has this beta density function, find P(.1 < Y < .2) by using binomial probabilities to 


evaluate F(y). (See Section 4.7.) 


b Applet Exercise If Y has this beta density function, find P(.1 < Y < .2), using the applet 
Beta Probabilities and Quantiles. 


c Applet Exercise If Y has this beta density function, use the applet Beta Probabilities and 
Quantiles to find the .05 and .95-quantiles for Y. 


d What is the probability that Y falls between the two quantiles you found in part (c)? 


A retail grocer has a daily demand Y for a certain food sold by the pound, where Y (measured 
in hundreds of pounds) has a probability density function given by 


3y?, 0xyxl 


0, elsewhere. 
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(She cannot stock over 100 pounds.) The grocer wants to order 100k pounds of food. She buys 
the food at 6¢ per pound and sells it at 104 per pound. What value of k will maximize her 
expected daily profit? 


Suppose that Y has a gamma distribution with œ = 3 and В = 1. 


a Use Poisson probabilities to evaluate P(Y < 4). (See Exercise 4.99.) 
b Applet Exercise Use the applet Gamma Probabilities and Quantiles to find P(Y < 4). 


Suppose that Y is a normally distributed random variable with mean ju and variance o?. Use 
the results of Example 4.16 to find the moment-generating function, mean, and variance of 
Y — 
2= 228. 
с 


What is the distribution of Z? Why? 


A random variable Y is said to have a log-normal distribution if X = In(Y) has a normal 
distribution. (The symbol In denotes natural logarithm.) In this case Y must be nonnegative. 
The shape of the log-normal probability density function is similar to that of the gamma 
distribution, with a long tail to the right. The equation of the log-normal density function is 
given by 
1 
f(y) = | суут 


0, elsewhere. 


2 2 
e (0600-1) [20 ) y» 0, 


Because In(y) is a monotonic function of y, 
P(Y < y) = P[In(Y) < In(y)] = P[X < In(y)]. 


where X has a normal distribution with mean и and variance o°. Thus, probabilities for random 
variables with a log-normal distribution can be found by transforming them into probabilities 
that can be computed using the ordinary normal distribution. If Y has a log-normal distribution 
with u = 4 and o? = 1, find 
a P(Y <4). 
b Р(У > 8). 
If Y has a log-normal distribution with parameters и and o°, it can be shown that 

E(Y) = e)? and V(Y) = e (e — 1). 


The grains composing polycrystalline metals tend to have weights that follow a log-normal 
distribution. For a type of aluminum, gram weights have a log-normal distribution with u = 3 
and o = 4 (in units of 10 g). 


a Find the mean and variance of the grain weights. 


b Find an interval in which at least 7596 of the grain weights should lie. [Hint: Use 
Tchebysheff's theorem.] 


c Findthe probability that a randomly chosen grain weighs less than the mean grain weight. 
Let Y denote a random variable with probability density function given by 

f) = 0/2)e, —oo«y«oo. 
Find the moment-generating function of Y and use it to find E(Y). 


Let fi (y) and р (у) be density functions and let a be a constant such that 0 < а < 1. Consider 
the function f (y) = afi(y) + (1 — a) fay). 


*4.186 


*4.187 


*4.188 


*4.189 


*4.190 


Supplementary Exercises 219 


a Show that f(y) is a density function. Such a density function is often referred to as a 
mixture of two density functions. 

b Suppose that Y; is a random variable with density function / (у) and that E(Yi) = ш and 
Var(Yi) = о2; and similarly suppose that Y; is a random variable with density function 
fa(y) and that E(Y2?) = и» and Var(¥2) = а2, Assume that У is a random variable whose 
density is a mixture of the densities corresponding to Y; and У. Show that 

i E(Y) =аш + (1—a)po. 
й Var(Y) = ao? +(1- ajo; + а(1 — аш = ш]. 
[Hint: E(Y2) = W? +07, і = 1, 21] 


The random variable Y, with а density function given by 


т—1 

ту уту 

Ро) = е", 

is said to have a Weibull distribution. The Weibull density function provides a good model 

for the distribution of length of life for many mechanical devices and biological plants and 
animals. Find the mean and variance for a Weibull distributed random variable with m = 2. 


O0<y<w,a,m>0 


Refer to Exercise 4.186. Resistors used in the construction of an aircraft guidance system have 
life lengths that follow a Weibull distribution with m = 2 and a = 10 (with measurements in 
thousands of hours). 


a Find the probability that the life length of a randomly selected resistor of this type exceeds 
5000 hours. 

b If three resistors of this type are operating independently, find the probability that exactly 
one of the three will burn out prior to 5000 hours of use. 

Refer to Exercise 4.186. 

a What is the usual name of the distribution of a random variable that has a Weibull distri- 
bution with m = 1? 

b Derive, in terms of the parameters о and m, the mean and variance of a Weibull distributed 
random variable. 

If n > 2 is an integer, the distribution with density given by 

1 
fy) = 4 В(1/2, [n — 2]/2) 


0, elsewhere. 


[ly] 95, sys, 


is called the r distribution. Derive the mean and variance of a random variable with the r 
distribution. 


A function sometimes associated with continuous nonnegative random variables is the failure 
rate (or hazard rate) function, which is defined by 

Ја) 
1— F(t) 
for a density function f(t) with corresponding distribution function F (t). If we think of the 
random variable in question as being the length of life of a component, r (t) is proportional to 
the probability of failure in a small interval after t, given that the component has survived up 
to time f. Show that, 


r(t) — 


a foran exponential density function, r (t) is constant. 


b fora Weibull density function with m > 1, r(t) is an increasing function of t. (See Exercise 
4.186.) 
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*4.191 


*4.192 


*4.193 


*4.194 


*4.195 


Suppose that Y is a continuous random variable with distribution function given by F(y) and 
probability density function f(y). We often are interested in conditional probabilities of the 
form P(Y < y|Y > c) fora constant c. 


a Show that, for y 7 c, 
F(y) = F() 
1 — F(c) 
Show that the function in part (a) has all the properties of a distribution function. 


If the length of life Y for a battery has a Weibull distribution with m = 2 and a = 3 (with 
measurements in years), find the probability that the battery will last less than four years, 
given that it is now two years old. 


P(Y <у|Ү > с) = 


The velocities of gas particles can be modeled by the Maxwell distribution, whose probability 
density function is given by 


= m 3/2 2 —v?(m/[2KT]) ) 
fv) = 4л inet) "* . у> 0, 
л 


where т is the mass of the particle, К is Boltzmann's constant, and T is the absolute temper- 
ature. 


a Find the mean velocity of these particles. 
b The kinetic energy of a particle is given by (1/2)mV?. Find the mean kinetic energy for a 
particle. 


Because 
F(y) - FO) 
1— Е(с) 
has the properties of a distribution function, its derivative will have the properties of a probability 
density function. This derivative is given by 


fo) Уг 
1— Е(с) 
We can thus find the expected value of Y, given that Y is greater Шап с, by using 


PY < у|Ү >су= 


1 oo 
E(Y|lY > с) = il yf Cy) dy. 


If Y, the length of life of an electronic component, has an exponential distribution with mean 
100 hours, find the expected value of Y, given that this component already has been in use for 
50 hours. 


We can show that the normal density function integrates to unity by showing that, if u > 0, 
1 is eg 0/Duy? dy = EM 
A/2z J- ми 


This, in turn, сап be shown by considering the product of two such integrals: 


1 99 el 99 g 1 99 99 2 2 
SEN (J e Quy %) (/ e Qux? dx) = =Í / e QDuGc ey) dx dy. 
2л —оо —oo 2л —oo J —oo 


By transforming to polar coordinates, show that the preceding double integral is equal to 1/u. 
Let Z be a standard normal random variable and W = (Z? + 3Z)?. 


a Use the moments of Z (see Exercise 4.199) to derive the mean of W. 
b Use the result given in Exercise 4.198 to find a value of w such that P(W < w) > .90. 


*4.196 


*4,197 


*4.198 


*4.199 
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Show that L'(1/2) = ./ by writing 


oo 
r(1/2) = f y e^? dy 
0 


by making the transformation y — (1/2)x? and by employing the result of Exercise 4.194. 
The function B(o, В) is defined by 


1 
В(о, В) = | y! — yy dy. 
0 
a Letting y = sin’ 0, show that 
z/2 
B(o, B) — 2 | sin?*-! Ө cos??-! Ө dé. 
0 


b Write Г(о)Г(#8) as a double integral, transform to polar coordinates, and conclude that 
Г(о)Г(д) 

rœ + 8)` 

The Markov Inequality Let 2 (У) be a function of the continuous random variable Y, with 
E(|g(Y)]) < оо. Show that, for every positive constant k, 


E(IgQY)p 
TEL REED 


B(o, B) — 


P(lg(Y) < 0) 1— 


[Note: This inequality also holds for discrete random variables, with an obvious adaptation in 
the proof.] 


Let Z be a standard normal random variable. 


a Show that the expected values of all odd integer powers of Z are 0. That is, ifi = 1,2,..., 
show that Е (22/1) = 0. [Hint: A function g(-) is an odd function if, for all y, g(—y) = 
— g(y). For any odd function g(y), Гая g(y) dy = 0, if the integral exists.] 


b Ifi=1,2,..., show that 
2'T (i+ 5) 
" 
[Hint: A function A (-) is an even function if, for all y, h(—y) = h(y). For any even function 


h(y), J h(y)dy = 2 |, h(y) dy, if the integrals exist. Use this fact, make the change 
of variable w = z?/2, and use what you know about the gamma function.] 

с Use the results in part (b) and in Exercises 4.81(b) and 4.194 to derive E(Z?), E(Z^), 
E(Z°), and E(Z?). 

d If; — 1,2,..., show that 


E(Z?) - 


E(Z^ = [Gi - 1). 


j=l 
This implies that the ith even moment is the product of the first i odd integers. 
Suppose that Y has a beta distribution with parameters о and £. 


a Ifa is any positive or negative value such that a + a > 0, show that 


_Te@+are + В) 


B Г(о)Г(о + B 4- a) 
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Why did your answer in part (a) require that o + a > 0? 
Show that, with a = 1, the result in part (a) gives E(Y) = o/(a + В). 


d Use the result in part (a) to give an expression for E (МУ). What do you need to assume 
about a? 


e Use the result in part (a) to give an expression for E(1/Y), E(1/VY), and E(1/Y?). What 
do you need to assume about o in each case? 


[e] 
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Introduction 


The intersection of two or more events is frequently of interest to an experimenter. 
For example, a gambler playing blackjack is interested in the event of drawing both 
an ace and a face card from a 52-card deck. A biologist, observing the number of 
animals surviving in a litter, is concerned about the intersection of these events: 


A: The litter contains n animals. 
B: y animals survive. 


Similarly, observing both the height and the weight of an individual represents the 
intersection of a specific pair of events associated with height-weight measurements. 
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5.2 


Most important to statisticians are intersections that occur in the course of sam- 
pling. Suppose that Y;, Yo,..., Y, denote the outcomes of п successive trials of 
an experiment. For example, this sequence could represent the weights of n people 
or the measurements of n physical characteristics for a single person. А specific set 
of outcomes, or sample measurements, may be expressed in terms ofthe intersection of 
then events (ү = yi), (Yo = y2), ..., (Yn = Yn), which we will denote as (Ү = у, 
Yo = yo, ..., Yn = Yn), or, more compactly, as (y1, yo, ..., Yn). Calculation of the 
probability of this intersection is essential in making inferences about the population 
from which the sample was drawn and is a major reason for studying multivariate 
probability distributions. 


Bivariate and Multivariate 
Probability Distributions 


Many random variables can be defined over the same sample space. For example, 
consider the experiment of tossing a pair of dice. The sample space contains 36 
sample points, corresponding to the mn = (6)(6) = 36 ways in which numbers may 
appear on the faces of the dice. Any one of the following random variables could be 
defined over the sample space and might be of interest to the experimenter: 


Yi: The number of dots appearing on die 1. 

Yo: The number of dots appearing on die 2. 

Үз: The sum of the number of dots on the dice. 

Ya: The product of the number of dots appearing on the dice. 


The 36 sample points associated with the experiment are equiprobable and corre- 
spond to the 36 numerical events (уу, y2). Thus, throwing a pair of 15 is the simple 
event (1, 1). Throwing a 2 on die 1 and a3 on die 2 is the simple event (2, 3). Because 
all pairs (уџ, y2) occur with the same relative frequency, we assign probability 1/36 
to each sample point. For this simple example, the intersection (у, y2) contains at 
most one sample point. Hence, the bivariate probability function is 


pQuy)-2P(Y2y,Yo-y)21/36 у = 1,2,...,6, у = 1,2,...,6. 


A graph of the bivariate probability function for the die-tossing experiment is 
shown in Figure 5.1. Notice that a nonzero probability is assigned to a point (y1, y2) 
in the plane if and only if y; = 1, 2,..., 6 and y2 = 1, 2,..., 6. Thus, exactly 
36 points in the plane are assigned nonzero probabilities. Further, the probabilities 
are assigned in such a way that the sum of the nonzero probabilities is equal to 1. In 
Figure 5.1 the points assigned nonzero probabilities are represented in the (y1, y2) 
plane, whereas the probabilities associated with these points are given by the lengths 
of the lines above them. Figure 5.1 may be viewed as a theoretical, three-dimensional 
relative frequency histogram for the pairs of observations (у, y2). As in the single- 
variable discrete case, the theoretical histogram provides a model for the sample 
histogram that would be obtained if the die-tossing experiment were repeated a large 
number of times. 


FIGURE 5.1 
Bivariate probability 
function; у = 
number of dots on 
die 1, у = number 
of dots on die 2 
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THEOREM 5.1 
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p, У) 


» 


У 


Let Y; and Y, be discrete random variables. The joint (or bivariate) probability 
function for Ү and Y» is given by 


js 39) = IP, = зу Mo = Jo) —69 << yy < 69, =©9 << jh = CO. 


In the single-variable case discussed in Chapter 3, we saw that the probability 
function for a discrete random variable Y assigns nonzero probabilities to a finite or 
countable number of distinct values of Y in such a way that the sum of the probabilities 
is equal to 1. Similarly, in the bivariate case the joint probability function p(yi, y2) 
assigns nonzero probabilities to only a finite or countable number of pairs of values 
(ут, y2). Further, the nonzero probabilities must sum to 1. 


If Y, and Y, are discrete random variables with joint probability function 
р(у\, y2), then 


1. р(у, y2) > О for all yi, yo. 
2. ЖЫ ыл, р(у\, y2) = 1, where the sum is over all values (yi, y2) that аге 
assigned nonzero probabilities. 


As in the univariate discrete case, the joint probability function for discrete random 
variables is sometimes called the joint probability mass function because it specifies 
the probability (mass) associated with each of the possible pairs of values for the 
random variables. Once the joint probability function has been determined for discrete 
random variables Y; and Y, calculating joint probabilities involving Ү and Y» is 
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straightforward. For the die-tossing experiment, P(2 < Ү < 3,1 < Yo < 2) is 


P(2 < Yı < 3,1 < Y) < 2) = p(2, 1) + р(2, 2) + pG, 1) + pG, 2) 
= 4/36 = 1/9. 


EXAMPLE 5.1 


Solution 


A local supermarket has three checkout counters. Two customers arrive at the counters 
at different times when the counters are serving no other customers. Each customer 
chooses a counter at random, independently of the other. Let Y; denote the number 
of customers who choose counter 1 and У, the number who select counter 2. Find 
the joint probability function of Y; and Y3. 


We might proceed with the derivation in many ways. The most direct is to consider 
the sample space associated with the experiment. Let the pair {i, j} denote the simple 
event that the first customer chose counter i and the second customer chose counter 
j, Where i, j = 1, 2, and 3. Using the mn rule, the sample space consists of 3 x 3 = 9 
sample points. Under the assumptions given earlier, each sample point is equally 
likely and has probability 1/9. The sample space associated with the experiment is 


S= K 1}, (1, 2}, (0, 3, 2, 1), 12, 2}, (2, 3}, , 1}, (3, 2}, (3, 3]. 


Notice that sample point (1, 1} is the only sample point corresponding to (Ү = 2, 
Ү = 0) and hence P(Y, = 2, Y = 0) = 1/9. Similarly, P(Y, = 1, Y = 1) = 
P((1,2) or (2, 1}) = 2/9. Table 5.1 contains the probabilities associated with each 
possible pair of values for Ү and Y;?—that is, the joint probability function for Y; and 
Y2. As always, the results of Theorem 5.1 hold for this example. 


Table 5.1 Probability function for Y; and ү, Example 5.1 


У 
y2 0 1 2 
0 1/9 2/9 1/9 
1 2/9 2/9 0 
2 1/9 0 0 


DEFINITION 5.2 


As in the case of univariate random variables, the distinction between jointly 
discrete and jointly continuous random variables may be characterized in terms of 
their (joint) distribution functions. 


For any random variables Y, and Y}, the joint (bivariate) distribution function 
F (y1, y2) is 


FO), y) = P(Y1 € yy, Yo < y); —00 < yj < 00, –00 < y? < оо. 
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For two discrete variables Ү and Yo, F (y1, у») is given by 


FQ. у) = У Y, pt, n). 


п=у: 65у 
For the die-tossing experiment, 
F(2, 3) = Р(ү < 2, їз < 3) 
= p(l, D + pd, 2) + p(l, 3) + р(2, 1) + pQ, 2) + р(2, 3). 


Because p(y, y2) = 1/36 for all pairs of values of y, and y; under consideration, 
F(2, 3) 2 6/36 = 1/6. 


EXAMPLE 5.2 


Solution 


Consider the random variables Y; and Y? of Example 5.1. Find F(—1, 2), F(1.5, 2), 
and F(5, 7). 


Using the results in Table 5.1, we see that 
Е(—1, 2) = P(% < –1, Y x 2) = P(9) = 0. 
Further, 
F(1.5, 2) = P(Yi < 1.5, № x 2) 
= p(0, 0) + pO, 1) + pO, 2) + p(1, 0) + pd, D) + pd, 2) = 8/9. 
Similarly, 
Е(5, 7) = P(Y < 5, Y <7) = 1. 


Notice that F (y1, y2) = 1 forall yı, y2 such that min{y,, y2} > 2. Also, F(yi, y2) = 0 
if min(y;, y2) < 0. [| 


DEFINITION 5.3 


Two random variables are said to be jointly continuous if their joint distribution 
function F (yi, y2) is continuous in both arguments. 


Let Y, and Y, be continuous random variables with joint distribution function 
Е (yi, yo). If there exists a nonnegative function / (у, y2), such that 


Nn ya 
F(yi, уз) = / | f (t1, t2) dtp аһ, 
—oo J —oo 


for all —co < у < oo, –оо < уз < oo, then Y, and Y» are said to be jointly 
continuous random variables. The function f (yj, y2) is called the joint prob- 
ability density function. 


Bivariate cumulative distribution functions satisfy a set of properties similar to 
those specified for univariate cumulative distribution functions. 
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THEOREM 5.2 If Y; and У are random variables with joint distribution function F (yi, y2), then 
1. F(—oo, —00) = F(—oo, y?) = F(yi, —оо) = 0. 
2. JEE 65) = 1. 
3. If yf > yı and yj > yo, then 


PO) ЕУ уы) = Oi) ЧЕ a y= O 


Part 3 follows because 


Е(уї, уз) — FOU, у) — FO, уз) + FO, ya) 
= Р(л < Yı < ур, у < № < у) > 0. 


Notice that Р (оо, оо) = lim, Шт, o; F (y1, y2) = 1 implies that the joint den- 
sity function f (y;, y?) must be such that the integral of f (y1, y2) over all values of 
(i, y2) is 1. 


THEOREM 5.2 If Y; and Y are jointly continuous random variables with a joint density function 
given by f (yi, y2), then 


1. fv ») > 0 for all уу, y2. 


As in the univariate continuous case discussed in Chapter 4, the joint density 
function may be intuitively interpreted as a model for the joint relative frequency 
histogram for Y; and Y3. 

For the univariate continuous case, areas under the probability density over an in- 
terval correspond to probabilities. Similarly, the bivariate probability density function 
f Cy, y2) traces a probability density surface over the (yı, y?) plane (Figure 5.2). 


FIGURE 5.2 FOr») 
A bivariate density 
function f(y, у) 
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Volumes under this surface correspond to probabilities. Thus, P (a1 € Ү < a, bı < 
Y? < bz) is the shaded volume shown in Figure 5.2 and is equal to 


b» a2 
f| | foo 
bi a 


EXAMPLE 5.3 


Solution 


FIGURE 5.3 
Geometric 
representation 
of f(yi, у), 
Example 5.3 


Suppose that a radioactive particle is randomly located in a square with sides of unit 
length. That is, if two regions within the unit square and of equal area are considered, 
the particle is equally likely to be in either region. Let Y; and Y» denote the coordinates 
of the particle's location. А reasonable model for the relative frequency histogram for 
Y; and Y is the bivariate analogue of the univariate uniform density function: 


a 


b 


І, О<у<1,0<у›<1, 
уоьэ = |, 


elsewhere. 


a Sketch the probability density surface. 
b Find F(.2, .4). 
c Find P(.1 < Yı < .3,0 < № < .5). 


The sketch is shown in Figure 5.3. 


4 2 
F(2, A) = | f Türr diak 


4 3 
zr f (1) dy, dy; 
0 0 
4 2 A 
zl (»] ) = | 2dy, = .08. 
0 0 0 


The probability F(.2, .4) corresponds to the volume under f (yi, y?) = 1, whichis 
shaded in Figure 5.3. As geometric considerations indicate, the desired probability 
(volume) is equal to .08, which we obtained through integration at the beginning 
of this part. 


Рур У) 


F(2, 4) 


Yo 
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5 ^3 
с Paznii0snis-[ | f Qu. y2) dy; ду» 
о Ja 


5 (3 
= [i f 1 dy, dy, = .10. 
0 Ја 


This probability corresponds to the volume under the density function 
fi. y2) = 1 that is above the region .1 < у < .3,0 € у < .5. Like the 
solution in part (b), the current solution can be obtained by using elementary ge- 
ometric concepts. The density or height of the surface is equal to 1, and hence the 
desired probability (volume) is 


Р(1< ү < 3,0 < Y, < .5) = (.2)(.5)(1) = .10. E 


A slightly more complicated bivariate modelis illustrated in the following example. 


EXAMPLE 5.4 


Solution 


FIGURE 5.4 
The joint density 
function for 
Example 5.4 


Gasoline is to be stocked in a bulk tank once at the beginning of each week and then 
sold to individual customers. Let Y; denote the proportion of the capacity of the bulk 
tank that is available after the tank is stocked at the beginning of the week. Because 
of the limited supplies, Y; varies from week to week. Let Y» denote the proportion of 
the capacity of the bulk tank that is sold during the week. Because Y and Y are both 
proportions, both variables take on values between 0 and 1. Further, the amount sold, 
y», cannot exceed the amount available, yı. Suppose that the joint density function 
for Y; and У is given by 


Зу, Ox »zyn-zl 
FO y) = | 0, elsewhere. 
A sketch of this function is given in Figure 5.4. 
Find the probability that less than one-half of the tank will be stocked and more 
than one-quarter of the tank will be sold. 


We want to find P(0 < Y; < .5, Y2 > .25). For any continuous random variable, the 


probability of observing a value in a region is the volume under the density function 
above the region of interest. The density function (у, y2) is positive only in the 


fy») 


MI 
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FIGURE 5.5 y, 
Region of integration 
for Example 5.4 


0 1/2 1 » 


large triangular portion of the (уџ, y2) plane shown in Figure 5.5. We are interested 
only in values of у; and у such that O € y; < .5 and y2 > .25. The intersection of 
this region and the region where the density function is positive is given by the small 
(shaded) triangle in Figure 5.5. Consequently, the probability we desire is the volume 
under the density function of Figure 5.4 above the shaded region in the (y1, y2) plane 
shown in Figure 5.5. 

Thus, we have 


У 
PO<Y, < .5,.25 < №) = J / 3yi dy, dy, 
1/4 1/4 


1/2 " 
= Зу | | ) а 
f. ni» " Yı 


1/2 
= / Зу (yı — 1/4) dy, 
1/4 


1/2 


1/2 
= pi = 6/9], 

= (0/8) — 6/80/41 - (1/60 — (3/8)01/16)] 

= 5/128. o 


Calculating the probability specified in Example 5.4 involved integrating the joint 
density function for Y; and Y» over the appropriate region. The specification of the lim- 
its of integration was made easier by sketching the region of integration in Figure 5.5. 
This approach, sketching the appropriate region of integration, often facilitates setting 
up the appropriate integral. 

The methods discussed in this section can be used to calculate the probability of 
the intersection of two events (Y; = у, Y? = y2). In a like manner, we can define а 
probability function (or probability density function) for the intersection of n events 
(Yi 2 у, Yo = y2,..., Yn = Yn). The joint probability function corresponding to 
the discrete case is given by 


р(у, yn... Yn) = P V1 = у, Yo = ys ..., Yn = уһ). 


The joint density function of Y1, Y2, ..., Y, is given by f (yi, у,..., Yn). AS in 
the bivariate case, these functions provide models for the joint relative frequency 
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Multivariate Probability Distributions 


5.1 


5.2 


5.3 


5.4 


distributions of the populations of joint observations (y1, y2,..., Yn) for the discrete 
case and the continuous case, respectively. In the continuous case, 


P(Y, € у, Y € у,..., Y, € Yn) = Е(у1,..., Yn) 


У1 
-f. f. 4 f (ti, ta, «25 ty) dty ... dt 


for every set of real numbers (yi, y2,..., уп). Multivariate distribution functions de- 
fined by this equality satisfy properties similar to those specified for the bivariate case. 


Exercises 


Contracts for two construction jobs are randomly assigned to one or more of three firms, A, B, 
and C. Let Y; denote the number of contracts assigned to firm A and Y; the number of contracts 
assigned to firm B. Recall that each firm can receive 0, 1, or 2 contracts. 


a Find the joint probability function for Y, and Y3. 
b Find F(1, 0). 


Three balanced coins are tossed independently. One of the variables of interest is Y;, the number 
of heads. Let Y; denote the amount of money won on a side bet in the following manner. If the 
first head occurs on the first toss, you win $1. If the first head occurs on toss 2 or on toss 3 you 
win $2 or $3, respectively. If no heads appear, you lose $1 (that is, win —$1). 


a Find the joint probability function for Y, and Y3. 


b What is the probability that fewer than three heads will occur and you will win $1 or less? 
[That is, find F(2, 1).] 


Of nine executives in a business firm, four are married, three have never married, and two are 
divorced. Three of the executives are to be selected for promotion. Let Y, denote the number 
of married executives and Y, denote the number of never-married executives among the three 
selected for promotion. Assuming that the three are randomly selected from the nine available, 
find the joint probability function of Y; and Y». 


Given here is the joint probability function associated with data obtained in a study of auto- 
mobile accidents in which a child (under age 5 years) was in the car and at least one fatality 
occurred. Specifically, the study focused on whether or not the child survived and what type of 
seatbelt (if any) he or she used. Define 


if no belt used 
0, if the child survived, Oy, dinobelt used, 
Ү, = and У = 1], if adult belt used, 


E 2, ifcar-seat belt used. 


Notice that Y; is the number of fatalities per child and, since children's car seats usually utilize 
two belts, Y; is the number of seatbelts in use at the time of the accident. 
У 
y2 0 1 Total 


0 36) 17 25 
1 14  .02 16 
2 .24 .05 .29 


Total 76 .24 1.00 


5.5 


5.6 


5.7 


5.8 


5.9 


5.10 
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a Verify that the preceding probability function satisfies Theorem 5.1. 
b Find F(1, 2). What is the interpretation of this value? 


Refer to Example 5.4. The joint density of Y;, the proportion of the capacity of the tank that 
is stocked at the beginning of the week, and У, the proportion of the capacity sold during the 
week, is given by 


Зу, Oxy»zyzl 
fon y2) = 


0, elsewhere. 


a Find F(1/2, 1/3) = P(Y; < 1/2, Y x 1/3). 
b Find P(Y} < Y,/2), the probability that the amount sold is less than half the amount 
purchased. 


Refer to Example 5.3. If a radioactive particle is randomly located in a square of unit length, a 
reasonable model for the joint density function for Y, and Y; is 


l, O<y<10<y <1, 


РО, уз) = | 


0, elsewhere. 


a Whatis P(Y, — Y; > .5)? 
b What is Р(Ү,Ү» < .5)? 


Let Y, and Y? have joint density function 


e O29, yı > 0, у > 0, 


РО, 2) = | 


; elsewhere. 


a Whatis P(Y, < 1, Y > 5)? 
b Whatis P(Y, + Y < 3)? 


Let Y; and Y; have the joint probability density function given by 


ky», О<у<1,0<у;<1, 
(у, y») = 
fon 0, elsewhere. 
a Find the value of k that makes this a probability density function. 
b Find the joint distribution function for Y; and Y3. 
c Find P(Y, < 1/2, № x 3/4). 


Let Y; and Y; have the joint probability density function given by 


k1—») Ozxyzr»-zhk 
РО, у) = 


elsewhere. 


a Find the value of k that makes this a probability density function. 
b Find P(Y; x 3/4, Y; > 1/2). 


An environmental engineer measures the amount (by weight) of particulate pollution in air 
samples of a certain volume collected over two smokestacks at a coal-operated power plant. 
One of the stacks is equipped with a cleaning device. Let Y; denote the amount of pollutant 
per sample collected above the stack that has no cleaning device and let Y? denote the amount 
of pollutant per sample collected above the stack that is equipped with the cleaning device. 
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5.11 


5.12 


5.13 


5.14 


Suppose that the relative frequency behavior of Y, and Y, can be modeled by 


k, 0xyz20xyrzlLl2yz)y 


foi y2) = | 


0, elsewhere. 

That is, Y; and Y; are uniformly distributed over the region inside the triangle bounded by 
yı = 2, y2 = 0, and 25 = yi. 

a Find the value of k that makes this function a probability density function. 

b Find P(Y, > 3Y;). That is, find the probability that the cleaning device reduces the amount 


of pollutant by one-third or more. 


Suppose that Y, and Y, are uniformly distributed over the triangle shaded in the accompanying 
diagram. 


Yo 
(0, 1) 


(-1, 0) (1, 0) MI 


a Find P(Y, < 3/4, Y; < 3/4). 
b Find P(Y, — Y; > 0). 
Let Yı and Y, denote the proportions of two different types of components in a sample from 


a mixture of chemicals used as an insecticide. Suppose that Y, and Y» have the joint density 
function given by 


2, 0<у<1,0<у,<1,0<у+ у; <1, 
0, elsewhere. 


ҒО, 2) = | 


(Notice that Ү + Y; < 1 because the random variables denote proportions within the same 
sample.) Find 

3/4, Y; < 3/4). 

1/2, Y) x 1/2). 


a Р(Ү, 
b P(Yi 


= 
Ss 


The joint density function of Y; and У» is given by 


303, у= 1 Ey x 1—- y, 0x yi <1, 


ХО, у) = 
fon» 0, elsewhere. 


a Find F(1/2, 1/2). 
b Find F(1/2,2). 
с Find P(Y, > №). 
Suppose that the random variables Y, and Y, have joint probability density function f (yi, y2) 
given by 
бур, О< y < у, у +y <2, 
foi») - | ee eee ud 
0, elsewhere. 
a Verify that this is a valid joint density function. 
b What is the probability that Y; + У is less than 1? 


5.15 


5.16 


5.17 


5.18 


5.3 
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The management at a fast-food outlet is interested in the joint behavior of the random variables 
Yi, defined as the total time between a customer's arrival at the store and departure from the 
service window, and Y», the time a customer waits in line before reaching the service window. 
Because Y, includes the time a customer waits in line, we must have Y, > Y». The relative 
frequency distribution of observed values of Y; and Y) can be modeled by the probability 
density function 


e", O<y Sy «oo, 


fou уз) = 0, elsewhere 


with time measured in minutes. Find 


a Р(ү «2, Y > 1). 
b P(Y; > 205). 
с P(Y, — № > 1). (Notice that Y, — Y, denotes the time spent at the service window.) 


Let Y, and У, denote the proportions of time (out of one workday) during which employees I 
and П, respectively, perform their assigned tasks. The joint relative frequency behavior of Y; 
and Y, is modeled by the density function 


K ) yty, 0О<у<1,0<у,<1, 
в 0, elsewhere. 
a Find P(Y, < 1/2, Y; > 1/4). 
b Find P(Y, + Y; < 1). 


Let (Y;, Y2) denote the coordinates of a point chosen at random inside a unit circle whose 
center is at the origin. That is, Y; and Y» have a joint density function given by 


1 

=, 2 2 < 1, 
ОЕ = Yr № < 

0, 


elsewhere. 
Find P(Y, < Y;). 


An electronic system has one each of two different types of components in joint operation. Let 
Y; and Y? denote the random lengths of life of the components of type I and type II, respectively. 
The joint density function is given by 


K ) (1/8yie 02/2, yp > 0, у > 0, 
У, y2) = 


; elsewhere. 


(Measurements аге in hundreds of hours.) Find P(Y; > 1, № > 1). 


Marginal and Conditional 
Probability Distributions 


Recall that the distinct values assumed by a discrete random variable represent mu- 
tually exclusive events. Similarly, for all distinct pairs of values yı, y2, the bivariate 
events (Y; = у, Y2 = y2), represented by (yi, у»), are mutually exclusive events. It 
follows that the univariate event (Y; = y1) is the union of bivariate events of the type 
(Yi = yi, № = y2), with the union being taken over all possible values for у». 
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For example, reconsider the die-tossing experiment of Section 5.2, where 


Ү, = number of dots on the upper face of die 1, 


Y, = number of dots on the upper face of die 2. 
Then 
Р(Ү = 1) = pd, D t pd, 2) + p(l, 3)+---+ pd, 6) 
= 1/36 + 1/36 + 1/36 + ··· + 1/36 = 6/36 = 1/6 
P(Y; = 2) = pk, D + р(2, 2)+ pQ, 3)+--++ p, 6 = 1/6 


P(Y; = 6) = p(6, 1) + p(6, 2) + p(6, 3) + --- + p(6, 6) = 1/6. 


Expressed in summation notation, probabilities about the variable Y, alone are 


6 
P(Y = у) = Pi) = У) рО, у). 


y2=1 


Similarly, probabilities corresponding to values of the variable Y» alone are given by 


6 
P2(y2) = PM = у) = У. pi y». 


yi-1 


Summation in the discrete case corresponds to integration in the continuous case, 
which leads us to the following definition. 


DEFINITION 5.4 a Let Y, and Y, be jointly discrete random variables with probability function 
р(у1, y2). Then the marginal probability functions of Ү and Y2, respectively, 
are given by 

pi) = 3 pOL») and po(y2) = D> pOL у). 
all y2 all yi 
b Let Y, and У be jointly continuous random variables with joint density function 


fi, y2). Then the marginal density functions of Ү and Y2, respectively, are 
given by 


fon = f fi. у) dy, and лоз = [ fO. y2) дуу. 


The term marginal, as applied to the univariate probability functions of Y; and 
Y2, has intuitive meaning. To find р; (уг), we sum p(yi, y2) over all values of y» 
and hence accumulate the probabilities on the y; axis (or margin). The discrete and 
continuous cases are illustrated in the following two examples. 
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EXAMPLE 5.5 


Solution 


From a group of three Republicans, two Democrats, and one independent, a committee 
of two people is to be randomly selected. Let Y; denote the number of Republicans 
and Y, denote the number of Democrats on the committee. Find the joint probability 
function of Y; and У and then find the marginal probability function of Y;. 


The probabilities sought here are similar to the hypergeometric probabilities of 
Chapter 3. For example, 


GC) 

1/\1/\0 3(2 6 

Р(Ү, = 1,5 = 1) = pd, 1) = ( = © -£ 
2) 


because there are 15 equally likely sample points; for the event in question we must 
select one Republican from the three, one Democrat from the two, and zero indepen- 
dents. Similar calculations lead to the other probabilities shown in Table 5.2. 

To find р (ут), we must sum over the values of У, as Definition 5.4 indicates. 
Hence, these probabilities are given by the column totals in Table 5.2. That is, 


pi(0) = p(0, 0) + p(0, 1) + p(0, 2) = 0+ 2/15 + 1/15 = 3/15. 


Similarly, 


pi) = 9/15 and p,(2) = 3/15. 
Analogously, the marginal probability function of Y? is given by the row totals. 


Table 5.2 Joint probability function for Y; and У, Example 5.5 


yı 
y2 0 1 2 Total 
0 0 3/15 3/15 6/15 
1 2/15 6/15 0 8/15 
2 1/15 0 0 1/15 
Total 3/15 9/15 3/15 1 | 
EXAMPLE 5.6 Let 
2y, Ox<y<10<y <l, 
fov y) = 4" А e 
0, elsewhere. 


Solution 


Sketch f (yi, y2) and find the marginal density functions for Y; and Y». 


Viewed geometrically, Ё (у, y2) traces a wedge-shaped surface, as sketched in 
Figure 5.6. 

Before applying Definition 5.4 to find fi (yi) and f2(y2), we will use Figure 5.6 
to visualize the result. If the probability represented by the wedge were accumulated 
on the y; axis (accumulating probability along lines parallel to the y» axis), the result 
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FIGURE 5.6 
Geometric 
representation 
of f(y, у), 
Example 5.6 


fO») 


» 


» 


would be a triangular probability density that would look like the side of the wedge 
in Figure 5.6. If the probability were accumulated along the y? axis (accumulating 
along lines parallel to the y; axis), the resulting density would be uniform. We will 
confirm these visual solutions by applying Definition 5.4. Then, if 0 < yı < 1, 


oo 1 1 
dies | Р, у) = f 2yi dy, = 2y1 (>|,) 
—oo 0 
and if y; < Oor y; > 1, 


oo 1 
ЛО) = | fO y2) dy) = | Ody, = 0. 
—0o 0 
Thus, 
2y, Ox<y <1, 
0, elsewhere. 


fno=] 


Similarly, if 0 < y2 < 1, 
1 


оо 1 
РО») = | fi уз) dy, = | 2y dy, — J =1 
а 0 0 
and if y2 < O or y» > 1, 


оо 1 
ло) = [ford = f ody, =0. 
—оо 0 
Summarizing, 
І, O<y <1, 


Paya) = | elsewhere. 


Graphs of ХА (у) and № (у) trace triangular and uniform probability densities, 
respectively, as expected. [| 


We now turn our attention to conditional distributions, looking first at the discrete 
case. 
The multiplicative law (Section 2.8) gives the probability of the intersection 
АПВ as 
P(AN B) = P(A)P(B|A), 


DEFINITION 5.5 
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where P(A) is the unconditional probability of A and P(B|A) is the probability of B 
given that A has occurred. Now consider the intersection of the two numerical events, 
(Yı = у) and (Y? = у»), represented by the bivariate event (yi, y2). It follows 
directly from the multiplicative law of probability that the bivariate probability for 
the intersection (yi, y2) is 


pO y) = PODPO) 
= p2(y2) p(y ly2)- 


The probabilities рі (ут) and p2(y2) are associated with the univariate probability 
distributions for Y, and Y; individually (recall Chapter 3). Using the interpretation 
of conditional probability discussed in Chapter 2, р(у |у) is the probability that the 
random variable Y; equals yı, given that Y» takes on the value y». 


If Y; and Y» are jointly discrete random variables with joint probability function 
p. y2) and marginal probability functions р (у) апа p2(y2), respectively, 
then the conditional discrete probability function of Y, given Y» is 

PO = yi, № =y) _ p(y, у) 


роу) = PM = yil¥o = у) = = : 
P = y2) p22) 


provided that p2 (y2) > 0. 


Thus, Р(Ү = 2| = 3) is the conditional probability that Y; = 2 given that Y; = 3. 
A similar interpretation can be attached to the conditional probability p(y2|y1). Note 
that p(Cyi|y2) is undefined if p2(y2) = 0. 


EXAMPLE 5.7 


Solution 


Refer to Example 5.5 and find the conditional distribution of Y, given that Y; = 1. 
That is, given that one of the two people on the committee is a Democrat, find the 
conditional distribution for the number of Republicans selected for the committee. 


The joint probabilities are given in Table 5.2. To find p(y,|Y2 = 1), we concentrate 

on the row corresponding to Y) = 1. Then 

p(0,1) 2/15 1 
p(1 8/15 . 


PY = 05 = 1) = 1 
pd, 1) 6/15 3 
4 


, 


PY = Пр = 1) = 


, 


pl) 8/15 — 

and 
ap POSÉ. 0 o 
P > 2|%,=1)= ua ee 


In the randomly selected committee, if one person is a Democrat (equivalently, if 
Y? = 1), there is a high probability that the other will be a Republican (equivalently, 
Yı = 1). E 
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In the continuous case, we can obtain an appropriate analogue of the conditional 
probability function p(y; |y»), but it is not obtained in such a straightforward manner. 
If Y, and Y, are continuous, P(Y; = y;|Y2 = y?) cannot be defined as in the discrete 
case because both (Y; = yj) and (Y; = у») are events with zero probability. The 
following considerations, however, do lead to a useful and consistent definition for a 
conditional density function. 

Assuming that Y; and Y are jointly continuous with density function (у, y2), 
we might be interested in a probability of the form 


P(Yi € ylh = y2) = Е(уцу»), 


which, as a function of y; for a fixed yo, is called the conditional distribution function 
of Y;, given Y? = yp. 


If Y; and У, are jointly continuous random variables with joint density function 
f (1, y2), then the conditional distribution function of Y, given Ү = у» is 


F(yily2) = P(Yi € yilY? = yo). 


Notice that F (y;|y2) is a function of y, for a fixed value of у». 

If we could take F (y;|y2), multiply by P(Y2 = y») for each possible value of Yo, 
and sum all the resulting probabilities, we would obtain F (y1). This is not possible 
because the number of values for y» is uncountable and all probabilities P(Yo = y2) 
are zero. But we can do something analogous by multiplying by № (ух) and then 
integrating to obtain 


Fon) = | F(yily2) РО») ду». 


The quantity № (ух) dy, can be thought of as the approximate probability that Y» takes 
on a value in a small interval about y», and the integral is a generalized sum. 
Now from previous considerations, we know that 


ғо) = f лаа = f [/ ЕСА dti 
- | J f(t, y2) dti ау». 


From these two expressions for F (y1), we must have 


FGily2 02) = J Ра, y) dt 


or 


” fti, y2) ds 


F(yily2) = An) 


We will call the integrand of this expression the conditional density function of Y, 
given Y; = y», and we will denote it by (уу). 


DEFINITION 5.7 
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Let Y; and У be jointly continuous random variables with joint density f (yi, y2) 
and marginal densities ў; (уг) and f2(y2), respectively. For any y» such that 
fo(y2) > 0, the conditional density of Y; given Y? = y» is given by 


foy») 
iCibs) e eM 
f202) 
and, for any y; such that fj (у) > 0, the conditional density of Y; given Y, = yı 


is given by 
_ (Ой) 
Озуп) = RUE 


Note that the conditional density f(yi|y2) is undefined for all уз such that 
fo Cy2) = 0. Similarly, f (y2|y1) is undefined if y, is such that fı (у) = 0. 


EXAMPLE 5.8 


Solution 


A soft-drink machine has a random amount Y; in supply at the beginning of a given 
day and dispenses a random amount Y; during the day (with measurements in gallons). 
It is not resupplied during the day, and hence Y, < У. It has been observed that Y, 


and Y> have a joint density given by 
1/2, О<урх<у;<2, 
0 elsewhere. 


РО, y2) = | 


That is, the points (y1, y2) are uniformly distributed over the triangle with the given 
boundaries. Find the conditional density of Y; given Yo = y2. Evaluate the probability 
that less than 1/2 gallon will be sold, given that the machine contains 1.5 gallons at 
the start of the day. 


The marginal density of Y» is given by 


de | ones 


Thus, 
y2 
(1/2) dy, = (1/2)у, OS y2 < 2, 
0 
RO = = 
J Ody, = 0, elsewhere. 
—00 


Note that № (у) > 0 if and only if 0 < у < 2. Thus, for any 0 < y2 < 2, using 
Definition 5.7, 


fowx) — 12 1 
So je , 0 = = * 
bo) ADO) ж Smer 


Also, f (yi|y2) is undefined if y; < 0 or y2 > 2. The probability of interest is 


fOily2) = 


1/2 1/2 1 1/2 1 
РО < 101 =15у= | fol» 2154, =] dy = = 3. 
oo 0 х x 
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If the machine contains 2 gallons at the start of the day, then 


1 


1/2 
PUTS ian =2) | т = Z. 
o X 4 


Thus, the conditional probability that Y; < 1/2 given Ү = y» changes appreciably 
depending on the particular choice of у». E 


5.19 


5.20 


5.21 


5.22 


Exercises 


In Exercise 5.1, we determined that the joint distribution of Y,, the number of contracts awarded 
to firm A, and У, the number of contracts awarded to firm B, is given by the entries in the 
following table. 


У 
У? 0 1 2 
0 1/9 2/9 1/9 
1 2/9 2/9 0 
2 1/9 0 0 


a Find the marginal probability distribution of У. 


b According to results in Chapter 4, Y; has a binomial distribution with n = 2 and p = 1/3. 
Is there any conflict between this result and the answer you provided in part (a)? 


Refer to Exercise 5.2. 


a Derive the marginal probability distribution for your winnings on the side bet. 


b What is the probability that you obtained three heads, given that you won $1 on the side bet? 


In Exercise 5.3, we determined that the joint probability distribution of Y;, the number of 
married executives, and Y>, the number of never-married executives, is given by 


ОЈ) 
Ө 


where y; and у» are integers, 0 < y; < 3,0 < y2 <3,and1< у + у <3. 


pou уз) = 


a Find the marginal probability distribution of Y;, the number of married executives among 
the three selected for promotion. 


Find P(Y, = 1|Y; = 2). 
If we let Уз denote the number of divorced executives among the three selected for promo- 
tion, then Уз = 3 — Y; — №. Find P(Y; = 1|У» = 1). 

d Compare the marginal distribution derived in (a) with the hypergeometric distributions with 
N =9,n = 3, and r = 4 encountered in Section 3.7. 


In Exercise 5.4, you were given the following joint probability function for 


0, if no belt used, 
0, if child survived, 
Ү = and № = { 1, if adult belt used, 
1, if not, 
2, if car-seat belt used. 


5.23 


5.24 


5.25 
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y 
yo 0 1 Total 


0 38 17 55 
1 14 02 16 
2 24.05 29 


Total .76 .24 1.00 


a Give the marginal probability functions for Y, and Y3. 
b Give the conditional probability function for Y; given Y, = 0. 
c Whatis the probability that a child survived given that he or she was in a car-seat belt? 


In Example 5.4 and Exercise 5.5, we considered the joint density of Y;, the proportion of the 
capacity of the tank that is stocked at the beginning of the week, and У, the proportion of the 
capacity sold during the week, given by 


Зу, OSwmsy <1, 
0, elsewhere. 


fOr, уз) = | 


a Find the marginal density function for Y». 
b For what values of у» is the conditional density f(y;|y») defined? 


c Whatis the probability that more than half a tank is sold given that three-fourths of a tank 
is stocked? 


In Exercise 5.6, we assumed that if a radioactive particle is randomly located in a square with 
sides of unit length, a reasonable model for the joint density function for Y; and Y» is 


1l, O<y<10<y <1, 
0, elsewhere. 


fOr, y2) = | 


Find the marginal density functions for Y, and Y>. 

What is P(.3 < Yı < .5)? P(.3 < Yo < .5)? 

For what values of у» is the conditional density f (yi|y») defined? 

For апу y2, 0 < y» < 1 what is the conditional density function of Y, given that Yo = у? 
Find P(.3 < Y; < .5|У = .3). 

Find P(.3 < Y, < .5|У = .5). 

Compare the answers that you obtained in parts (а), (d), and (e). For any y2,0 < у < 1 
how does P(.3 < Ү < .5) compare to Р(.3 < Y, < .5|Y2 = у)? 


coo aA Бә 


Let Y; and Y; have joint density function first encountered in Exercise 5.7: 


e Ort», y 0, y > 0, 


ХО, уз) = 


0, elsewhere. 


a Find the marginal density functions for Y, and Y2. Identify these densities as one of those 
studied in Chapter 4. 


What is P(1 < Y; < 2.5)? P(1 < Y, < 2.5)? 
For what values of у» is the conditional density f (yı|y2) defined? 


For any у» > 0, what is the conditional density function of Y, given that У = y2? 


oana c 


For any y; > 0, what is the conditional density function of Y? given that Y; = у? 
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5.26 


5.27 


5.28 


5.29 


5.30 


f For any y > 0, how does the conditional density function f (yı|y2) that you obtained in 
part (d) compare to the marginal density function fı(yı) found in part (a)? 


g What does your answer to part (f) imply about marginal and conditional probabilities that 
Y, falls in any interval? 


In Exercise 5.8, we derived the fact that 


4уу, OxyxzLOzy»czli 
РО, уз) = 


0, elsewhere 


is a valid joint probability density function. Find 


the marginal density functions for Y; and Y3. 

P(Y, < 1/2|¥2 > 3/4). 

the conditional density function of Y, given Y; = y». 
the conditional density function of У given Ү = yı. 


P(Y, x 3/4|Ү» = 1/2). 


ооо 7 ы 


In Exercise 5.9, we determined that 


61 —y2), 0< у= у <1, 


Ол, уз) = 
Mae 0, elsewhere 


is a valid joint probability density function. Find 


the marginal density functions for Y; and №». 

P < 1/2|% < 3/4). 

the conditional density function of Y; given Y; = y». 
the conditional density function of У given Ү = yı. 
Р(Ү > 3/4|Ү, = 1/2). 


ооо mB 


In Exercise 5.10, we proved that 


1, 0< у <2,0< у < 1,2у› < у, 


РО, у) = 


0, elsewhere 


is a valid joint probability density function for Y,, the amount of pollutant per sample collected 
above the stack without the cleaning device, and for Y2, the amount collected above the stack 
with the cleaner. 


a If we consider the stack with the cleaner installed, find the probability that the amount of 
pollutant in a given sample will exceed .5. 


b Given that the amount of pollutant in a sample taken above the stack with the cleaner is 
observed to be 0.5, find the probability that the amount of pollutant exceeds 1.5 above the 
other stack (without the cleaner). 


Refer to Exercise 5.11. Find 


a the marginal density functions for Y; and Y». 
b P(Y > 1/2|Ү, = 1/4). 


In Exercise 5.12, we were given the following joint probability density function for the random 
variables Y, and Y», which were the proportions of two components in a sample from a mixture 


5.31 


5.32 


5.33 


5.34 
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of insecticide: 
2, 0< у < 1,0< у < 1,0 < у +у <1, 
Токт | 0, elsewhere. 
a Find P(Y; > 1/2]Y; x 1/4). 
b Find P(Y; > 1/2]Y; = 1/4). 


In Exercise 5.13, the joint density function of Y, and Y; is given by 


30у у2, y—-l<y<1-y,0<y <1, 
fOr, уэ) = " 


elsewhere. 


Show that the marginal density of Y, is a beta density with a = 2 and В = 4. 
Derive the marginal density of Y3. 

Derive the conditional density of У given Ү = yı. 

Find P(Y, > 0|Y; = .75). 


co C» 


Suppose that the random variables Y; and Y, have joint probability density function, f (y1, y2), 
given by (see Exercise 5.14) 


буу, О<ур<у,у+у›<2, 
Ол, №) = 
0, elsewhere. 


Show that the marginal density of Y, is a beta density with a = 3 and В = 2. 
Derive the marginal density of Y2. 

Derive the conditional density of Y? given Ү = yi. 

Find Р(У < 1.1|Y; = .60). 


ana C» 


Suppose that У, is the total time between a customer's arrival in the store and departure from the 
service window, Y; is the time spent in line before reaching the window, and the joint density 
of these variables (as was given in Exercise 5.15) is 


e", О < у < у < оо, 


РО, y2) = 


0, elsewhere. 


a Find the marginal density functions for Y, and Y. 


b What is the conditional density function of Y, given that Y; = y2? Be sure to specify the 
values of y» for which this conditional density is defined. 


c What is the conditional density function of Y? given that Y; = yı? Be sure to specify the 
values of y; for which this conditional density is defined. 


d Is the conditional density function (уу) that you obtained in part (b) the same as the 
marginal density function fı(yı) found in part (a)? 


e What does your answer to part (d) imply about marginal and conditional probabilities that 
Y, falls in any interval? 


If Y; is uniformly distributed on the interval (0, 1) and, for 0 < y, < 1, 


l/y, 0< у < у, 
fQ2ly) = 


0, elsewhere, 


a what is the “пате” of the conditional distribution of Y given Ү = y? 
b find the joint density function of Y; and У. 


с find the marginal density function for Y2. 
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5.35 


5.36 


5.37 


5.38 


*5.39 


*5.40 


*5.41 


Refer to Exercise 5.33. If two minutes elapse between a customer's arrival at the store and his 
departure from the service window, find the probability that he waited in line less than one 
minute to reach the window. 


In Exercise 5.16, Y, and Y; denoted the proportions of time during which employees I and II 
actually performed their assigned tasks during a workday. The joint density of Ү and Y; is 
given by 


yty, О<у<1,0<у,<1, 
ХО, у) = 


0, elsewhere. 


a Find the marginal density functions for Y, and Y3. 
Find P(Y, > 1/2] > 1/2). 


If employee II spends exactly 50% of the day working on assigned duties, find the probability 
that employee I spends more than 7596 of the day working on similar duties. 


In Exercise 5.18, Y; and Y; denoted the lengths of life, in hundreds of hours, for components of 
types I and П, respectively, in an electronic system. The joint density of Y; and У» is given by 


(1/8)ye 12. y > 0, р > 0 
ХО, у) = 
А elsewhere. 
Find the probability that a component of type II will have a life length in excess of 200 hours. 


Let Y, denote the weight (in tons) of a bulk item stocked by a supplier at the beginning of a 
week and suppose that Y; has a uniform distribution over the interval 0 < у, < 1. Let Y, denote 
the amount (by weight) of this item sold by the supplier during the week and suppose that Y; 
has a uniform distribution over the interval 0 < y2 < yı, where y; is a specific value of Yj. 


a Find the joint density function for Y; and Y. 

b Ifthe supplier stocks a half-ton of the item, what is the probability that she sells more than 
a quarter-ton? 

с If it is known that the supplier sold a quarter-ton of the item, what is the probability that 
she had stocked more than a half-ton? 


Suppose that Y, and Y; are independent Poisson distributed random variables with means A, 
and A», respectively. Let W = Ү + Y2. In Chapter 6 you will show that W has a Poisson 
distribution with mean A, + А. Use this result to show that the conditional distribution of Y4, 
given that W = w, is a binomial distribution with n = w and p = A,/(A; + 5). 


Suppose that Y, and Y, are independent binomial distributed random variables based on samples 
of sizes n, and n2, respectively. Suppose that p; = p» = p. Thatis, the probability of “success” 
is the same for the two random variables. Let W = Y, + Y2. In Chapter 6 you will prove that 
W has a binomial distribution with success probability p and sample size nı + n2. Use this 
result to show that the conditional distribution of Y;, given that W = w, is a hypergeometric 
distribution with N = пу + по, п = w, andr = n. 


A quality control plan calls for randomly selecting three items from the daily production 
(assumed large) of a certain machine and observing the number of defectives. However, the 
proportion p of defectives produced by the machine varies from day to day and is assumed 
to have a uniform distribution on the interval (0, 1). For a randomly chosen day, find the 
unconditional probability that exactly two defectives are observed in the sample. 


1. Exercises preceded by an asterisk are optional. 


*5.42 


5.4 


DEFINITION 5.8 


THEOREM 5.4 
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The number of defects per yard Y for a certain fabric is known to have a Poisson distribution 
with parameter A. However, A itself is a random variable with probability density function 
given by 


е^, X20, 
ҒО) = 


0, elsewhere. 
Find the unconditional probability function for Y. 


Independent Random Variables 


In Example 5.8 we saw two dependent random variables, for which probabilities as- 
sociated with Y; depended on the observed value of Y». In Exercise 5.24 (and some 
others), this was not the case: Probabilities associated with Ү were the same, regard- 
less of the observed value of Y;. We now present a formal definition of independence 
of random variables. 

Two events A and B are independent if P(A П B) = P(A) x P(B). When 
discussing random variables, if a « b and c « d we are often concerned with events 
of the type (a < Yı < b) П (c < Y, < d). For consistency with the earlier definition 
of independent events, if Y; and Y? are independent, we would like to have 


P(a < Yı <b, c< Ү < 4) = Р(а< Yı x b) x P(c < Yn xd) 


for any choice of real numbers a < b and c < d. That is, if Y; and Y» are independent, 
the joint probability can be written as the product of the marginal probabilities. This 
property will be satisfied if Y; and Y? are independent in the sense detailed in the 
following definition. 


Let Y; have distribution function Ё| (у), Y2 have distribution function F5(y»), 
and Y; and Y, have joint distribution function F (yi, y2). Then Ү and Y, are 
said to be independent if and only if 


F(yi, y2) = Е (у) (у) 


for every pair of real numbers (yi, y). 
If Y, and Y; are not independent, they are said to be dependent. 


It usually is convenient to establish independence, or the lack of it, by using the 
result contained in the following theorem. The proof is omitted; see “References and 
Further Readings" at the end of the chapter. 


If Y, and Y> are discrete random variables with joint probability function 
р(у1, y2) and marginal probability functions р! (у) апа рг (у), respectively, 
then Y, and Y> are independent if and опу if 


PO у) = р1(у1) P22) 


for all pairs of real numbers (y1, y»). 
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If Y; and Y? are continuous random variables with joint density function f (y1, y2) 
and marginal density functions fj(y;) and № (у), respectively, then Y; and У 
are independent if and only if 


Рол, 2) = / Оу) РО») 


for all pairs of real numbers (y1, y2). 


We now illustrate the concept of independence with some examples. 


EXAMPLE 5.9 


Solution 


For the die-tossing problem of Section 5.2, show that Y; and Y» are independent. 


In this problem each of the 36 sample points was given probability 1/36. Consider, for 
example, the point (1, 2). We know that p(1, 2) = 1/36. Also, р! (1) = P(Y = 1) = 
1/6 and p; (2) = Р(№ = 2) = 1/6. Hence, 


pC, 2) = pi(l)p2(2). 


The same is true for all other values for у and у», and it follows that Ү and Y» are 
independent. 


EXAMPLE 5.10 


Solution 


Refer to Example 5.5. Is the number of Republicans in the sample independent of the 
number of Democrats? (Is Y; independent of Y2?) 


Independence of discrete random variables requires that p(y1, yo) = piCyi) pa (yo) 
for every choice (y1, y2). Thus, if this equality is violated for any pair of values, 
(ут, y2), the random variables are dependent. Looking in the upper left-hand corner 
of Table 5.2, we see 


p(0, 0) = 0. 
But р! (0) = 3/15 and p2(0) = 6/15. Hence, 
PO, 0) z pi (0) p(0), 
so Y; and Y? are dependent. E 


EXAMPLE 5.11 


Let 


бууу;, O<y<1,0<y <1, 


fo» у) = 0, elsewhere. 


Show that Y; and Y, are independent. 


Solution 
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We have 
оо 1 y3 1 
| foi уз) 45 = f бууу dy; = бу\ z] 
—оо 0 3 Jo 
RD = = 25, O<y <1, 
oo oo 
[| fi y2) ау = J Ody, = 0, elsewhere. 
—О©©о —О©©о 
Similarly, 
oo 1 
/ fO. уз) ду 3 буу; dy, = 335, 0< у <1, 
—0o 0 
РО) = ү .5 ТИ 
| fO. у) дуу = f Ody, = 0, elsewhere. 
—оо —oo 
Hence, 
f Qu y2) = fi fa(y2) 
for all real numbers (y1, y2), and, therefore, Y; and Y, are independent. mi 


EXAMPLE 5.12 


Solution 


FIGURE 5.7 
Region over which 
f(y,, y2) is positive, 

Example 5.12 


Let 


_]2 0<y<y<l, 
fOr у) = | 0, elsewhere. 
Show that Y; and Y2 are dependent. 


We see that f (y1, y2) = 2 over the shaded region shown in Figure 5.7. Therefore, 


yı 


» P 
2 dy = 27| =2y, O<y <1, 
АО) = / ; б ! ! 


0, elsewhere. 


У 
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Similarly, 

i 1 
2dy = 2у| —-2(1—y) О<у›<1, 
БО) = / " | i: 7 


y» 
0, elsewhere. 


Hence, 


ХО, уз) + AOD Буз) 


for some pair of real numbers (ут, y2), and, therefore, Y; and У are dependent. Ё 


THEOREM 5.5 


You will observe a distinct difference in the limits of integration employed in 
finding the marginal density functions obtained in Examples 5.11 and 5.12. The limits 
of integration for y» involved in finding the marginal density of Y; in Example 5.12 
depended on уу. In contrast, the limits of integration were constants when we found the 
marginal density functions in Example 5.11. If the limits of integration are constants, 
the following theorem provides an easy way to show independence of two random 
variables. 


Let Yı and Y, have a joint density f(yi. y2) that is positive if and only if 
à € y, € b andc < у < d, for constants a, b, c, and d; and f (yj, y3) = 0 
otherwise. Then Y, and Y, are independent random variables if and only if 


fOr y2) = gh») 


where е (у) is a nonnegative function of y; alone and h(y2) is a nonnegative 
function of y; alone. 


The proof of this theorem is omitted. (See "References and Further Readings" at 
the end of the chapter.) The key benefit of the result given in Theorem 5.5 is that 
we do not actually need to derive the marginal densities. Indeed, the functions 2 (у) 
and A (у) need not, themselves, be density functions (although they will be constant 
multiples of the marginal densities, should we go to the bother of determining the 
marginal densities). 


EXAMPLE 5.13 


Solution 


Let Y; and Y, have a joint density given by 


2y, O<y < 1,05 у <1, 
0, elsewhere. 


fov.» - | 


Are Y; and Y, independent variables? 


Notice that f (yi, y2) is positive if and only if 0 < y, < 1 and 0 < y2 < 1. Further, 
f Qu. y2) = gh), where 


|J» О<у<1, 
801) elsewhere, 


2, О<у›<1, 


and h = 
02) E elsewhere. 
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Therefore, Ү and Y, are independent random variables. Notice that g(y,) and 
h(y2), as defined here, are not density functions, although 2g(y,) and h(y2)/2 are 
densities. L1 


EXAMPLE 5.14 


Solution 


Refer to Example 5.4. Is Y;, the amount in stock, independent of Y2, the amount sold? 


Because the density function is positive if and only if 0 < у, < y; < 1, there do 
not exist constants a, b, c, and d such that the density is positive over the region 
а < yı < b,c € y2 < d. Thus, Theorem 5.5 cannot be applied. However, Y; and Y 
can be shown to be dependent random variables because the joint density is not the 
product of the marginal densities. Ei 


5.43 


5.44 
5.45 


Definition 5.8 easily can be generalized to п dimensions. Suppose that we have n 
random variables, Y;,...,Y,, where Y; has distribution function F;(y;), for 
і = 1,2,...,n; and where Ү, Y?, ..., Y, have joint distribution function F (y1, 
ya, ..., Yn). Then Yj, Y2,..., Y, are independent if and only if 


Fou, У2, seg Vn) = Fity): FC) 


for all real numbers y1, y2,..., Yn, With the obvious equivalent forms for the discrete 
and continuous cases. 


Exercises 


Let Y; and Y; have joint density function f (yi, y2) and marginal densities fi(yi) and № (у), 
respectively. Show that Y, and Y, are independent if and only if (уу) = fi(yi) for all 
values of y; and for all у» such that № (у) > 0. A completely analogous argument establishes 
that Y, and У» are independent if and only if f (y»|yi) = р (у) for all values of y» and for all 
yı such that fi(y;) > 0. 


Prove that the results in Exercise 5.43 also hold for discrete random variables. 


In Exercise 5.1, we determined that the joint distribution of Y;, the number of contracts awarded 
to firm A, and У», the number of contracts awarded to firm B, is given by the entries in the 
following table. 


yı 
y2 0 1 2 
0 1/9 2/9 1/9 
1 2/9 2/9 0 
2 1/9 0 0 


The marginal probability function of Y, was derived in Exercise 5.19 to be binomial with n — 2 
and p = 1/3. Are Y, and У independent? Why? 
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5.46 


5.47 


5.48 


5.49 


5.50 


5.51 


Refer to Exercise 5.2. The number of heads in three coin tosses is binomially distributed with 
n = 3, p = 1/2. Are the total number of heads and your winnings on the side bet independent? 
[Examine your answer to Exercise 5.20(b).] 


In Exercise 5.3, we determined that the joint probability distribution of Y;, the number of 
married executives, and У, the number of never-married executives, is given by 


4 3 2 
(2) (5) А = у= 33 
G 
3 
where у; and у» are integers, 0 < y; < 3,0 < у < 3, and 1 € у + y? < 3. Are Y, and Y? 
independent? (Recall your answer to Exercise 5.21.) 


pou уз) = 


In Exercise 5.4, you were given the following joint probability function for 
0, if no belt used, 


and У»={ l, if adult belt used, 


0, if child survived, 
Yi 1 = n 
1, ifnot, 


2, ifcar-seat belt used. 


yı 
ya 0 1 Total 


0 386-1] 55 
1 4  .02 .16 
2 24 .05 .29 


Total .76 24 1.00 


Are Y, and Y> independent? Why or why not? 


In Example 5.4 and Exercise 5.5, we considered the joint density of Y;, the proportion of the 
capacity of the tank that is stocked at the beginning of the week and У, the proportion of the 


capacity sold during the week, given by 
Зу, O0 уру 1, 
0, elsewhere. 


РО, уз) = | 


Show that Ү, and Y; are dependent. 


In Exercise 5.6, we assumed that if a radioactive particle is randomly located in a square with 
sides of unit length, a reasonable model for the joint density function for Y, and Y; is 
1, 0<y<1,0<y <l, 


0, elsewhere. 


ҒО, ya) = | 


а Are Ү and Y; independent? 

b Does the result from part (a) explain the results you obtained in Exercise 5.24 (d)-(f)? 
Why? 

In Exercise 5.7, we considered Ү and Y, with joint density function 


e-Ovt»n, y > 0, у > 0, 


four =| 


, elsewhere. 


а Are Ү and Y; independent? 
b Does the result from part (a) explain the results you obtained in Exercise 5.25 (d)-(f)? Why? 


5.52 


5.53 


5.54 


5.55 


5.56 


5.57 


5.58 


5.59 
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In Exercise 5.8, we derived the fact that 
4ууу, O<y <10<y <1, 
onore | 0, elsewhere 
is a valid joint probability density function. Are Y; and Y independent? 
In Exercise 5.9, we determined that 
61—y), Osy Sy <1, 
оше | 0, elsewhere 
is a valid joint probability density function. Are Y; and Y; independent? 


In Exercise 5.10, we proved that 


1, 0О<у,<2,0<у›< 1,2у› < у, 


(т, y2) = 
fov» 0, elsewhere 

is a valid joint probability density function for Y;, the amount of pollutant per sample collected 
above the stack without the cleaning device, and У, the amount collected above the stack with 
the cleaner. Are the amounts of pollutants per sample collected with and without the cleaning 
device independent? 


Suppose that, as in Exercise 5.11, Y; and Y» are uniformly distributed over the triangle shaded 
in the accompanying diagram. Are Ү and Y, independent? 


In Exercise 5.12, we were given the following joint probability density function for the random 
variables Y; and У, which were the proportions of two components in a sample from a mixture 
of insecticide: 


2, 0<у < 1,0 < у < 1,0 < у + у <1, 


ГОлкэ) = | 0, elsewhere. 

Are Y; and Y» independent? 

In Exercises 5.13 and 5.31, the joint density function of Y, and Y, was given by 
30s, у—1<у›<1—у,0<у <1, 


0, elsewhere. 


fou) =| 


Are the random variables Y, and Y, independent? 


Suppose that the random variables Y, and Y» have joint probability density function, f (yi, y2), 
given by (see Exercises 5.14 and 5.32) 


буру, О<у х<у,ур+у›<2, 
0, elsewhere. 


fou =| 


Show that Y, and Y? are dependent random variables. 


If Y, is the total time between a customer's arrival in the store and leaving the service window 
and if Y> is the time spent in line before reaching the window, the joint density of these variables, 
according to Exercise 5.15, is 

e", О<у›<у,<оо 


0, elsewhere. 


fou | 


Are Y; and Y» independent? 
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5.63 


5.64 


*5.65 


*5.66 


5.67 


In Exercise 5.16, Y; and У, denoted the proportions of time that employees I and II actually 
spent working on their assigned tasks during a workday. The joint density of Y, and У» is given 
by 


foi 2) = 


yty, О<у<1,0<у,<1, 
elsewhere. 


Are Y, and Y»? independent? 


In Exercise 5.18, Y, and У, denoted the lengths of life, in hundreds of hours, for components 
of types I and II, respectively, in an electronic system. The joint density of Y; and Y is 


foi y) inen yı > 0, y2 > 0, 
1, Y= 


$ elsewhere. 
Are Y, and Y, independent? 


Suppose that the probability that a head appears when a coin is tossed is p and the probability 
that a tail occurs is q = 1 — p. Person A tosses the coin until the first head appears and 
stops. Person B does likewise. The results obtained by persons A and B are assumed to be 
independent. What is the probability that A and B stop on exactly the same number toss? 


Let Y, and Y, be independent exponentially distributed random variables, each with mean 1. 
Find P( Yi > Y| Yı < 2Y2). 


Let Y, and Y, be independent random variables that are both uniformly distributed on the 
interval (0, 1). Find P( Y; < 2Y; | Y; < 32). 


Suppose that, for —1 < o < 1, the probability density function of (Y, Y?) is given by 


[1 —&{(1 — 2e!) (1 — 2e )} je, О<у,0 < у», 
fou 92) = 


elsewhere. 


, 


a Show that the marginal distribution of Y; is exponential with mean 1. 
b What is the marginal distribution of У? 
€ Show that Y, and У are independent if and only if o = 0. 


Notice that these results imply that there are infinitely many joint densities such that both 
marginals are exponential with mean 1. 


Let Е, (yı) and Р (у) be two distribution functions. For any a, —1 < a < 1, consider Y; and 
Y; with joint distribution function 


Fi, y) = MODPODU — afl — Ё|(у,)}{1— PROH. 


What is F (yi, оо), the marginal distribution function of У, ? [Hint: What is Ё›(оо)?] 
What is the marginal distribution function of У? 

If æ = 0 why are Y, and Y; independent? 

Are Y, and Y, independent if a 4 0? Why? 


о2о c» 


Notice that this construction can be used to produce an infinite number of joint distribution 
functions that have the same marginal distribution functions. 


In Section 5.2, we argued that if Y; and Y, have joint cumulative distribution function F (yi, y2) 
then for any a < b and c < d 


P(a < Yı <b,c < Ya d) = F(b, d) — F(b, c) — F(a,d) + F(a, с). 


5.68 


5.69 


5.70 


5.71 
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If У, and Y, are independent, show that 
P(a < Yı x b,c < Y x d) = P(a < Yi x D) x P(c < № < d). 


[Hint: Express P(a < Y, < b) in terms of Fi(-).] 


A bus arrives at a bus stop at a uniformly distributed time over the interval 0 to 1 hour. A 
passenger also arrives at the bus stop at a uniformly distributed time over the interval O to 1 
hour. Assume that the arrival times of the bus and passenger are independent of one another and 
that the passenger will wait for up to 1/4 hour for the bus to arrive. What is the probability that 
the passenger will catch the bus? [Hint: Let У, denote the bus arrival time and Y? the passenger 
arrival time; determine the joint density of Y, and У, and find P(Y; < Y; < Y; + 1/4).] 


The length of life Y for fuses of a certain type is modeled by the exponential distribution, with 


(1/3)e?P, у> 0, 
ХО) = 


L elsewhere. 


(The measurements are in hundreds of hours.) 


a If two such fuses have independent lengths of life Y; and Y», find the joint probability 
density function for Y, and Y3. 


b One fuse in part (a) is in a primary system, and the other is in a backup system that comes 
into use only if the primary system fails. The total effective length of life of the two fuses 
is then Y; + У. Find P(Y, + Y; < 1). 


A supermarket has two customers waiting to pay for their purchases at counter I and one 
customer waiting to pay at counter II. Let Y; and Y? denote the numbers of customers who 
spend more than $50 on groceries at the respective counters. Suppose that Y; and Y, are 
independent binomial random variables, with the probability that a customer at counter I will 
spend more than $50 equal to .2 and the probability that a customer at counter II will spend 
more than $50 equal to .3. Find the 


a joint probability distribution for У; and Y». 


b probability that not more than one of the three customers will spend more than $50. 


Two telephone calls come into a switchboard at random times in a fixed one-hour period. 
Assume that the calls are made independently of one another. What is the probability that the 
calls are made 


a inthe first half hour? 


b within five minutes of each other? 


The Expected Value of a Function 
of Random Variables 


You need only construct the multivariate analogue to the univariate situation to justify 
the following definition. 
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DEFINITION 5.9 Let g(Yi, Y2,..., Yz) be a function of the discrete random variables, Y;, 
Yo, ..., Yk, which have probability function p(yi, yo,..., yg). Then the ex- 


pected value of g(Yi, Y2,..., Yg) is 


АСОЕВ ЭЗЭ ЕО 


all yg all y» all yi 


If Yı, Y2, ..., Y are continuous random variables with joint density function 


f Gi yai... Ye), then? 


oo oo oo 
INEO Wag a e es Me) =| -f / gi y» .. 
00 ea] —DO0 


x fi, у, ..., Ye) dy, ду)... 


» 53/1) 


dy,. 


EXAMPLE 5.15 Let Y, and Y, have joint density given by 
2, О<у<1,0<у;<1, 
РО, 92) = lo 


elsewhere. 


Find E (Y1 0). 


Solution From Definition 5.9 we obtain 


со со i 1 
E(Yi1Y2) = | | уу f O1, y2) dy, dy, zii f y1y2(2y1) dy, dyz 
—oo a о Jo 


1 2у} 1 1 2 22 
= — dy, = = i= -2 
Ee = MOE eg 


1 


o 3 


We will show that Definition 5.9 is consistent with Definition 4.5, in which we 
defined the expected value of a univariate random variable. Consider two random 
variables Y; and Y, with density function / (у, y2). We wish to find the expected 


value of g(Yi, Y2) = Yi. 
Then from Definition 5.9 we have 


zo = | J yı f 1, уз) dy, ду, 


= I »[f fov 99 dy | dn. 


The quantity within the brackets, by definition, is the marginal density function for 


Y;. Therefore, we obtain 
oo 
EM) = | yi fiU у, 
—o0o 


which agrees with Definition 4.5. 


2. Again, we say that the expectations exist if $^---»7|g(yi. ya. .... yan)lpQi. ya. туз yk) or if 


[f eon Y2 es FOL yn ук) dy, . . . dy, is finite. 
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EXAMPLE 5.16 


Solution 


Let Y, and Y, have a joint density given by 


2y, 0<y<1,0<y»y<l, 
0, elsewhere. 


fou = | 


Find the expected value of Y;. 


l pl 
Е(Ү,) = / f yı(2yı) dy, dy; 
o Jo 


E 2n |), =] 2a EN 
=J 3 lo y? = a ca wq ces 


Refer to Figure 5.6 and estimate the expected value of Y;. The value E(Yi) = 2/3 
appears to be quite reasonable. El 


EXAMPLE 5.17 


Solution 


In Figure 5.6 the mean value of Y; appears to be equal to .5. Let us confirm this visual 
estimate. Find E(Y;). 


1 1 1 2у? 1 
Е(Ю) = f f yy dvds f т 1 dy, 
0 0 0 2 0 


1 271 
Уз 1 
= d = — = —, 
Í улау» 2 | 2 Bi 


EXAMPLE 5.18 


Solution 


Let Y; and Y, be random variables with density function 


2y, 0<y<1,0<y<l, 
0, elsewhere. 


fo, y2) = | 
Find У(У,). 
The marginal density for Y; obtained in Example 5.6 is 


== 271, 0 = ур & JB 
fio) = m elsewhere. 


Then V(Y\) = E(Y7) — [E(Y)F., and 


k+2 
0 


1 
k Е k : k ду 2 
E(Y{) = хло) = | yy) ау = -——. 
—oo 0 
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If we let К = 1 and k = 2, it follows that E(Y,) and Е(Ү?) аге 2/3 апа 1/2, 
respectively. Then У(Ү,) = E(Y7) — [E (Y) = 1/2— (2/3)? = 1/18. E 


EXAMPLE 5.19 


Solution 


A process for producing an industrial chemical yields a product containing two types 
of impurities. For a specified sample from this process, let Y; denote the proportion of 
impurities in the sample and let Y denote the proportion of type I impurities among 
all impurities found. Suppose that the joint distribution of Y; and Y? can be modeled 
by the following probability density function: 
РЕ ү 0<y<1,0<y <l, 
0, elsewhere. 


Find the expected value of the proportion of type I impurities in the sample. 


Because Y; is the proportion of impurities in the sample and Y» is the proportion of 
type I impurities among the sample impurities, it follows that Y; Y» is the proportion 
of type I impurities in the entire sample. Thus, we want to find E (Y; Y2): 


1 1 1 
1 
EQ) = | | 2yiya(l = yd dy =2 | пао (5) a 
0 0 0 
1 2 3 1 
2 yO 1 1. 1 
[о yi) dn (® i) 2 3 6 


Therefore, we would expect 1/6 of the sample to be made up of type I impurities. li 


5.6 


THEOREM 5.6 


THEOREM 5.7 


Special Theorems 


Theorems that facilitate computation of the expected value of a constant, the expected 
value of a constant times a function of random variables, and the expected value of 
the sum of functions of random variables are similar to those for the univariate case. 


Let c be a constant. Then 


PROSIC 


Let g(Y;, Y?) be a function of the random variables Y; and Y» and let c be a 
constant. Then 


E[cg(Yi, №)] = cE[g(Yi, Y2)]. 


ТНЕОКЕМ 5.8 
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Let Y; and Y»? be random variables and g;(%, Y2), g2(%1, Ү»),..., gx (Yi, Y2) 
be functions of Y; and Y2. Then 


E[g1(Y1, Yo) + ж ОЛ Yo) 4- оз + gx (X1, Y2)] 
= E[g1(Y1, Y2)] + E[go(Y1, Y2)] Е -- - + Еа, (т, Y2)]. 


The proofs of these three theorems are analogous to the univariate cases discussed 
in Chapters 3 and 4. 


EXAMPLE 5.20 


Solution 


Refer to Example 5.4. The random variable Y, — Y; denotes the proportional amount 
of gasoline remaining at the end of the week. Find E(Y; — Y2). 


Employing Theorem 5.8 with gi (Yi, Y2) = Yı and g(Yi, Y2) = —Y2, we see that 
E(Y; — Y?) = Е(ү) + E(-Y)). 

Theorem 5.7 applies, yielding E(—Y2) = — E (Y2); therefore, 
E(Y; — Y?) = E(Yi) – E2). 


Also, 
1 


p py 3 3 
E= | | буф, = [a= J ==, 
А а ee 
yı 13 
к = f | y2(3y1) dy; dy, = А [5G 1i Ja =] 571 dı 


Ж 
Di a E 


E(Y — Y?) = (3/4) — (3/8) = 3/8, 


Thus, 


so we would expect 3/8 of the tank to be filled at the end of the week’s sales. L| 


THEOREM 5.9 


If the random variables under study are independent, we sometimes can simplify 
the work involved in finding expectations. The following theorem is quite useful in 
this regard. 


Let Y; and У be independent random variables and g(Y;) and л (У) be functions 
of only Ү and У, respectively. Then 


E[g(Y))hQ72)] = Elg Yi) E [Ah QY2) ], 


provided that the expectations exist. 
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Proof We will give the proof of the result for the continuous case. Let f (y1, y2) denote 
the joint density of Y; and Y2. The product g(Y;)h(Y2) is a function of Y; and 
Y2. Hence, by Definition 5.9 and the assumption that Y; and Y» are independent, 


E [g(YOh()] = [| i AGL I 
= i jl AO ie ew Renee 
= | «болоо | 3509 d | ay 


Е / 8G QE th) dy, 


oo 


=E ina f 801) iG) dy, = E[gQY] E [h(72)] . 


The proof for the discrete case follows in an analogous manner. 


EXAMPLE 5.21 Refer to Example 5.19. In that example we found E (Y, Y?) directly. By investigating 
the form of the joint density function given there, we can see that Y; and Y» are 
independent. Find E (Y; Y2) by using the result that E (Y, Y2) = E (Yı) E(Y2) if Yı and 
Y» are independent. 


Solution The joint density function is given by 


K [ove Оху<1,0<у<1, 
^^ 3 elsewhere. 
Hence, 
HOD fy 20 — у) dy = 20 — у), О = у <1, 
11) = 
я elsewhere, 
and 
1 
1 2 
20 у) dy; === d =i saxi 
£6 fy 20 — у) dy (1— yı) К < уу < 
0, elsewhere. 


We then have 


1 2 3 1 1 
za = | nd = xod - 2 (2 z) | 2 
) 


2 8 
E) = 1/2 


because Y is uniformly distributed over (0, 1). 
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It follows that 
E(YiY?) = EYDE) = (1/3)(1/2) = 1/6, 
which agrees with the answer in Example 5.19. O 


5.72 


5.73 


5.74 


5.75 


Exercises 


In Exercise 5.1, we determined that the joint distribution of Y,, the number of contracts awarded 
to firm A, апа Y>, the number of contracts awarded to firm B, is given by the entries in the 
following table. 


bal 
y2 0 1 2, 
0 1/9 2/9 1/9 
1 2/9 2/9 0 
2 1/9 0 0 


The marginal probability function of Y, was derived in Exercise 5.19 to be binomial with n — 2 
and p — 1/3. Find 


a Е(Ү,). 
b V(%). 
с E(Y, — Y2). 


In Exercise 5.3, we determined that the joint probability distribution of Y;, the number of 
married executives, and У, the number of never-married executives, is given by 


4 ( 3 2 
(5) 3) -T 2s) 
9 , 
Ө 
where у; and у» are integers, 0 € y; < 3,0 < y2 < 3,and1 < yı + yo < 3. Find the expected 
number of married executives among the three selected for promotion. (See Exercise 5.21.) 


pi у) = 


Refer to Exercises 5.6, 5.24, and 5.50. Suppose that a radioactive particle is randomly located 
in a square with sides of unit length. A reasonable model for the joint density function for Y; 
and Y; is 
1, O<y <1,0<» <l, 
FOr Ya) = elsewhere. 
What is E(Y; — Y2)? 
What is Е(Ү,Ү»)? 
What is E(Y2 + Y2)? 
What is V (Y, У)? 


aoa cC» 


Refer to Exercises 5.7, 5.25, and 5.51. Let Y, and Y, have joint density function 


eor», y > 0, у> 0 


four) =| 
0 


А elsewhere. 
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5.76 


5.77 


5.78 


5.79 


5.80 


a What are Е(Ү + Y2) and V (Y, + У)? 

b What is P(Y, — Y; > 3)? 

c What is P(Y; — Y; < —3)? 

d What are E(Y, — Y2) and V (Y, — У)? 

e What do you notice about V (Y; + Y2) and V (Y; — Y2)? 

In Exercise 5.8, we derived the fact that 

4ууу, О<у<1,0<у,<1, 

ҒО, 2) = | ` i 
fon» 0, elsewhere. 

a Find E(Y)). 

b Find V(Yj). 


c Find E(Y; — Р). 
In Exercise 5.9, we determined that 


6(1—y), OX y Sy <1, 


А elsewhere 


ХО, 25) = | 
is а valid joint probability density function. Find 


a E(Y,) and Е(У,). 

b V(Y)) and V(Y;). 

C Е(Ү, = ЗҮ). 

In Exercise 5.10, we proved that 


1, 0<у<2,0<у < 1,250 < у, 
0, elsewhere 


fOr, уэ) = | 


is а valid joint probability density function for Y,, the amount of pollutant per sample collected 
above the stack without the cleaning device, and Y2, the amount collected above the stack with 
the cleaner. 


a Find E(Y,) and E(Y;). 

b Find V(Y;) and V(¥2). 
The random variable Y; — Y» represents the amount by which the weight of pollutant can 
be reduced by using the cleaning device. Find E(Y, — Y;). 


d Find V(Y, — У). Within what limits would you expect Y, — Y; to fall? 


Suppose that, as in Exercise 5.11, Y; and Y» are uniformly distributed over the triangle shaded 
in the accompanying diagram. Find E (Y; У). 


Vo 
(0, 1) 


GI, 0) (0,0  » 


In Exercise 5.16, Y; and У, denoted the proportions of time that employees I and II actually 
spent working on their assigned tasks during a workday. The joint density of Y, and Y» is 


5.81 


5.82 


5.83 


5.84 


5.85 
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given by 
Xt». O< wy <10<y <1, 


3 elsewhere. 


РОл, у) = | 


Employee I has a higher productivity rating than employee II and a measure of the total 
productivity of the pair of employees is 30Y; + 25У». Find the expected value of this measure 
of productivity. 


In Exercise 5.18, Y; and Y? denoted the lengths of life, in hundreds of hours, for components 
of types I and II, respectively, in an electronic system. The joint density of Y, and Y» is 


(1/8)y,e- 17272, у> 0, у > 0, 


Җ elsewhere. 


fou =| 


One way to measure the relative efficiency of the two components is to compute the ratio Y>/Y,. 
Find E(Y5/ Yi). [Hint: In Exercise 5.61, we proved that Y, and Y, are independent.] 


In Exercise 5.38, we determined that the joint density function for Y;, the weight in tons of a 
bulk item stocked by a supplier, and У, the weight of the item sold by the supplier, has joint 
density 

Пу, 0<ур< у 1, 


РО, y2) = | 


0, elsewhere. 


In this case, the random variable Y; — Y measures the amount of stock remaining at the end 
of the week, a quantity of great importance to the supplier. Find E(Y; — Y2). 


In Exercise 5.42, we determined that the unconditional probability distribution for Y, the 
number of defects per yard in a certain fabric, is 


p(y) = 0/27", 300,179 es 
Find the expected number of defects per yard. 


In Exercise 5.62, we considered two individuals who each tossed a coin until the first head 
appears. Let Y; and Y; denote the number of times that persons А and B toss the coin, respec- 
tively. If heads occurs with probability p and tails occurs with probability д = 1 — p, it is 
reasonable to conclude that Y, and Y are independent and that each has a geometric distribution 
with parameter p. Consider Y, — Y», the difference in the number of tosses required by the two 
individuals. 


Find E(Yi), E(¥2), and E(Y; — Y2). 

Find E(¥?), E(Y2), and E(Y; Y?) (recall that Y; and Y» are independent). 
Find E(Y; — У)? and V (Y, — №). 

Give an interval that will contain Y; — У with probability at least 8/9. 


2 с^ c9 


In Exercise 5.65, we considered random variables Y, and У, that, for —1 < о < 1, have joint 
density function given by 


[1—o((1—2e?1)(1—2e72)]e? ?», Oxy,0z у, 
fou уз) = 
Г elsewhere 
and established that the marginal distributions of Y, and У are both exponential with mean 1. 
Find 
a E(Y,) and Е(Ү,). 
b V(Y)) and У(У,). 
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*5.86 


5.87 


5.88 


5.7 


с Е(Ү, — Y>). 
а EY”). 
e V(Y, — Y2). Within what limits would you expect Y; — Y» to fall? 


Suppose that Z is a standard normal random variable and that Y, and Y, are x?-distributed 
random variables with v, and v, degrees of freedom, respectively. Further, assume that Z, Y;, 
and Y, are independent. 


a Define W = Z/A/Y,. Find E(W) and V (W). What assumptions do you need about the 
value of v? [Hint: W = Z(1/A/Yi) = g(Z)h(Yi). Use Theorem 5.9. The results of 
Exercise 4.112(d) will also be useful.] 

b Define U = Y;/Y;. Find E(U) and V (U). What assumptions about v, and v» do you need? 
Use the hint from part (a). 


Suppose that Y; and Y? are independent х? random variables with v; and v; degrees of freedom, 
respectively. Find 


a E(Yi +»). 
b V(Y; + Y2). [Hint: Use Theorem 5.9 and the result of Exercise 4.112(a).] 


Suppose that you are told to toss a die until you have observed each of the six faces. What is 
the expected number of tosses required to complete your assignment? [Hint: If Y is the number 
of trials to complete the assignment, Y = Ү + Y + Y3 + Y4 + Ys + Yo, where Y; is the trial on 
which the first face is tossed, Y; = 1, Ү is the number of additional tosses required to get a face 
different than the first, Уз is the number of additional tosses required to get a face different than 
the first two distinct faces, ..., Ys is the number of additional tosses to get the last remaining 
face after all other faces have been observed. Notice further that fori = 1,2,...,6, Y; hasa 
geometric distribution with success probability (7 — i)/6.] 


The Covariance of Two Random Variables 


Intuitively, we think of the dependence of two random variables Y; and Y? as implying 
that one variable—say, Y;—either increases or decreases as Y changes. We will 
confine our attention to two measures of dependence: the covariance between two 
random variables and their correlation coefficient. In Figure 5.8(a) and (b), we give 
plots of the observed values of two variables, Y; and Y2, for samples of n = 10 
experimental units drawn from each of two populations. If all the points fall along 
a straight line, as indicated in Figure 5.8(a), Y; and Y? are obviously dependent. In 
contrast, Figure 5.8(b) indicates little or no dependence between Y, and Y3. 

Suppose that we knew the values of Е(Ү,) = ш and E(Y?) = ил and located this 
point on the graph in Figure 5.8. Now locate a plotted point, (y1, y2), on Figure 5.8(a) 
and measure the deviations (y; — ш) and (y2 — u2). Both deviations assume the 
same algebraic sign for any point, (у, у»), and their product (yı — ш) (у2 — Шо) is 
positive. Points to the right of и yield pairs of positive deviations; points to the left 
produce pairs of negative deviations; and the average of the product of the deviations 
(ут = ш) Cy? — H2) is large and positive. If the linear relation indicated in Figure 5.8(a) 
had sloped downward to the right, all corresponding pairs of deviations would have 
been of the opposite sign, and the average value of (yi — ш) (ух — u2) would have 
been a large negative number. 


FIGURE 5.8 
Dependent and 
independent 
observations 
for (у, у) 
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» У 
e e 
i e 
е Ф 
е е 
ш ° ш P 
2 r- е am e 
. е 
е е 
e e s 
e 
| | | 
By У By У 
(а) (b) 


The situation just described does not occur for Figure 5.8(b), where little depen- 
dence exists between Y; and Y2. Their corresponding deviations (ут — ш) and (у – иә) 
will assume the same algebraic sign for some points and opposite signs for others. 
Thus, the product (уу = ш) (у — u2) will be positive for some points, negative for 
others, and will average to some value near zero. 

Clearly, the average value of (Yı — ш) (У — u2) provides a measure of the linear 
dependence between Y, and Y2. This quantity, E[(Y1 — ui)(Yo — шо)], is called the 
covariance of Y; and У». 


If Y; and Y, are random variables with means ш and и», respectively, the 
covariance of Үү and Y> is 


Cov(Y;, Y?) = E [(Y1 — ш) (№ — и2)]. 


The larger the absolute value of the covariance of Ү; and Y>, the greater the 
linear dependence between Y, and Y». Positive values indicate that Y; increases as Y? 
increases; negative values indicate that Y; decreases as Y» increases. A zero value of 
the covariance indicates that the variables are uncorrelated and that there is no linear 
dependence between Y, and Y3. 

Unfortunately, it is difficult to employ the covariance as an absolute measure of 
dependence because its value depends upon the scale of measurement. As a result, it is 
difficult to determine at first glance whether a particular covariance is large or small. 
This problem can be eliminated by standardizing its value and using the correlation 
coefficient, p, a quantity related to the covariance and defined as 


Cov(Y;, Y2) 
p ———— 


0102 


where o and o» are the standard deviations of Y, and У», respectively. Supplemental 
discussions of the correlation coefficient may be found in Hogg, Craig, and McKean 
(2005) and Myers (2000). 

A proof that the correlation coefficient o satisfies the inequality —1 < o < lis 
outlined in Exercise 5.167. 
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THEOREM 5.10 


Proof 


The sign of the correlation coefficient is the same as the sign of the covariance. 
Thus, о > 0 indicates that Y? increases as Y; increases, and о = +1 implies perfect 
correlation, with all points falling on a straight line with positive slope. A value of 
p = Oimplies zero covariance and no correlation. A negative coefficient of correlation 
implies a decrease in Y» as Y; increases, and p = — 1 implies perfect correlation, with 
all points falling on a straight line with negative slope. A convenient computational 
formula for the covariance is contained in the next theorem. 


If Y; and У are random variables with means u; and u2, respectively, then 
Cov(Y;, Y) = E [(Y1 — ш) (0 — u3)] = Е(Ү,Ү») — Е(У,)Е(Ү»). 


Cov(Y;, Y) = E[(QY1 — ш) (Y2 — u5)] 


= EY Ys = Yo = Оо Е Wo): 


From Theorem 5.8, the expected value of a sum is equal to the sum of the 
expected values; and from Theorem 5.7, the expected value of a constant times 
a function of random variables is the constant times the expected value. Thus, 


Cov(Y1, Yo) = E(YyY2) — ui E(Y2) — po EM) + Kika 
Because E(Y,) = ш and E(Y2) = иә, it follows that 
Cov(Y;, Yo) = Е( 0) — EM) E(Y?) = E(Y1Y2) — pi po. 


EXAMPLE 5.22 


Solution 


Refer to Example 5.4. Find the covariance between the amount in stock Y; and amount 
of sales Y>. 


Recall that Y; and Y? have joint density function given by 


Fons) =| 


Зу, OSMsm 51, 
0, elsewhere. 


1 yı 1 y? yı 
E(YiY2) =] | y1y2(3y1) dy, dy, =] ayt (22 o 
o Јо 0 2 |o 
1 541 
3 з fy 3 
=] = [| |=. 
"E 215 ls 10 


From Example 5.20, we know that E(Yi) = 3/4 and E(Y?) = 3/8. Thus, using 
Theorem 5.10, we obtain 


Thus, 


Соу(Ү\, Yo) = Е(Ү,Ү») — Е(Ү,)Е(Ү») = (3/10) — (3/4)(3/8) = .30 — .28 = .02. 


In this example, large values of Y? can occur only with large values of Y, and the 
density, f (y1, y2), is larger for larger values of Y, (see Figure 5.4). Thus, it is intuitive 
that the covariance between Y, and Y» should be positive. E 
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EXAMPLE 5.23 


Solution 


Let Y, and Y, have joint density given by 
2y, О<у<1,0<у,<1, 
fo à ЫР s 
0, elsewhere. 


Find the covariance of Y, and Y;. 


From Example 5.15, E(Y1Y2) = 1/3. Also, from Examples 5.16 and 5.17, ш = 
Е(Ү,) = 2/3 and m = Е(Ү) = 1/2, so 


Cov(Y,, Yo) = E(ViY3) — pipes = (1/3) — (2/3)(1/2) = 0. ш 


ТНЕОКЕМ 5.11 


Proof 


Example 5.23 furnishes a specific example of the general result given in 
Theorem 5.11. 


If Y; and У are independent random variables, then 
Cov(Y;, Y2) =), 


Thus, independent random variables must be uncorrelated. 


Theorem 5.10 establishes that 
Cov(Y1, Yo) = E(Y1Y2) — Hika 
Because Y, and Y» are independent, Theorem 5.9 implies that 
Е(Ү 0) = E(Y)) E(Y) = ipa, 
and the desired result follows immediately. 
Notice that the random variables Y; and Y» of Example 5.23 are independent; hence, 


by Theorem 5.11, their covariance must be zero. The converse of Theorem 5.11 is 
not true, as will be illustrated in the following example. 


EXAMPLE 5.24 


Solution 


Let Y, and Y? be discrete random variables with joint probability distribution as shown 
in Table 5.3. Show that Y; and Y? are dependent but have zero covariance. 


Calculation of marginal probabilities yields p; (—1) = pi(1) = 5/16 = p5(—1) = 
pa» (1), and p;(0) = 6/16 = р›(0). The value p(0, 0) = 0 in the center cell stands 


Table 5.3 Joint probability distribution, Example 5.24 


yı 
у? =] 0 +1 
—1 1/16 3/16 1/16 


0 3/16 0 3/16 
+1 1/16 3/16 1/16 
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out. Obviously, 


p(0, 0) # pi(O)p2(0), 
and this is sufficient to show that Y; and Y? are dependent. 
Again looking at the marginal probabilities, we see that E(Y,) = E(Y;) = 0. 
Also, 


EOY) = J vpn. у) 


all y; all y2 


= (-DC-D(/16) + (~ 1)00)(3/16) + (-D(0)(0/16) 
+ (0)(—1) 3/16) + (0)(0)(0) + (0)(1) 3/16) 
+ (1)С—1)(1/16) + (0)(0) (3/16) + (1)(1)(1/16) 
= (1/16) — (1/16) — (1/16) + (1/16) = 0. 
Thus, 
Cov(Y;, Yo) = E(Y1Y2) — E(Yi)) E(Y?) = 0 — 0(0) = 0. 


This example shows that the converse of Theorem 5.11 is not true. If the covariance 
of two random variables is zero, the variables need not be independent. L] 


5.89 


5.90 


5.91 


Exercises 


In Exercise 5.1, we determined that the joint distribution of Y;, the number of contracts awarded 
to firm A, and У, the number of contracts awarded to firm В, is given by the entries in the 
following table. 


У 
y2 0 1 2 
0 19 2/9 1/9 
1 2/9 2/9 0 
2 1/9 0 0 


Find Cov(Y;, Y2). Does it surprise you that Cov(Y;, Y2) is negative? Why? 


In Exercise 5.3, we determined that the joint probability distribution of Y;, the number of 
married executives, and У, the number of never-married executives, is given by 


ООС 
Ө 


where у; and у» are integers, 0 < y; < 3,0 < y2 < 3, and1 < у + y2 € 3. Find Соу(У|, Y2). 


pou уз) = 


In Exercise 5.8, we derived the fact that 
4уу, O<y 51,05 у <1, 
РО, y) = 


0, elsewhere. 


Show that Cov(Y,;, Y2) = 0. Does it surprise you that Cov(Y, Y?) is zero? Why? 


5.92 


5.93 


5.94 


5.95 


5.96 


5.97 
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In Exercise 5.9, we determined that 


61—y), O<y Sys, 
Ў уз) = А i 


, elsewhere 


is a valid joint probability density function. Find Cov(Y;, Y2). Are Y, and Y; independent? 


Let the discrete random variables Y, and Y; have the joint probability function 


р(у\, уз) = 1/3, for (yi, y2) = (—1,0), (0, 1), (1,0). 


Find Cov(Y;, Y2). Notice that Y; and Y, are dependent. (Why?) This is another example of 
uncorrelated random variables that are not independent. 


Let Y; and Y, be uncorrelated random variables and consider U, = Ү + Y) and U; = Y; — Y». 


а Find the Cov(U;, U2) in terms of the variances of Y; and №. 

b Find an expression for the coefficient of correlation between U, and U3. 

с Is it possible that Cov(U;, U2) = 0? When does this occur? 

Suppose that, as in Exercises 5.11 and 5.79, Y, and У are uniformly distributed over the triangle 


shaded in the accompanying diagram. 


Уз 
(0, 1) 


(-1, 0) (1, 0) » 


Find Cov(Y,, Y2). 
Are Y, and Y» independent? (See Exercise 5.55.) 


Find the coefficient of correlation for Y; and Y». 


© о cC» 


Does your answer to part (b) lead you to doubt your answer to part (a)? Why or why 
not? 


Suppose that the random variables Y; and Y? have means jz; and и» and variances о? апа 
оў, respectively. Use the basic definition of the covariance of two random variables to 
establish that 


a Cov(Y;, Y2) = Соу(%», Yı). 


b Cov(Y;i, Yi) = V(Yi)) = о}. That is, the covariance of а random variable and itself is just 
the variance of the random variable. 


The random variables Ү and Y; are such that E(Yi) = 4, Е(№) = —1, V(Y,) = 2 and 
V(Yj) = 8. 


а What is Cov(Y;, Yi)? 

b Assuming that the means and variances are correct, as given, is it possible that Cov(Y;, 
Y2) = 7? [Hint: If Cov(Y;, Y2) = 7, what is the value of p, the coefficient of correlation?] 

c Assuming that the means and variances are correct, what is the largest possible value for 
Cov(Y,, У)? If Cov(Y;, Y2) achieves this largest value, what does that imply about the 
relationship between У; апа У? 
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5.98 


5.99 
5.100 


5.101 


5.8 


d Assuming that the means and variances are correct, what is the smallest possible value for 
Cov(Y;, Y2)? If Cov(¥1, Y2) achieves this smallest value, what does that imply about the 
relationship between Y, and У? 


How big or small can Cov(Y;, Y2) be? Use the fact that p < 1 to show that 


=y V(Yi) x V(Y)) € Cov(Yi, Y2) < / V(Yi) x V). 


If c is any constant and Y is a random variable such that E (Y) exists, show that Cov(c, Y) = 0. 
Let Z be a standard normal random variable and let Y, — Z and Y; — Z?. 

What are E(Y,) апа E(Y2)? 

What is E(YiY2)? [Hint: Е(Ү,Ү») = E(Z?), recall Exercise 4.199.] 

What is Cov(Y;, У)? 

Notice that P(Y) > 1|Ү, > 1) = 1. Are Y, and Y, independent? 


о2о c» 


In Exercise 5.65, we considered random variables Y, and Y; that, for —1 < o < 1, have joint 
density function given by 


ол, у) = [1 o((1—2e?)(1—2e?2)e??, 0< y, 0x y, 
1:27 — 


elsewhere. 


We established that the marginal distributions of Y; and Y, are both exponential with mean 1 
and showed that Y, and Y; are independent if and only if œ = 0. In Exercise 5.85, we derived 
E(Y; Y). 

a Derive Cov(Y,, Y2). 

b Show that Cov(Y;, Y2) = 0 if and only if a = 0. 

с Argue that Y; and Y, are independent if and only if p = 0. 


The Expected Value and Variance of Linear 
Functions of Random Variables 


In later chapters in this text, especially Chapters 9 and 11, we will frequently 
encounter parameter estimators that are linear functions of the measurements in 
a sample, Y1, Yo, ..., Yn. If a1, a5, ..., an are constants, we will need to find the 
expected value and variance of a linear function of the random variables Y,, 
Y. 235575 Yn 


п 
Ui = а + а»  asYs +--+ на = 3 ш. 


i=1 


We also may be interested in the covariance between two such linear combinations. 
Results that simplify the calculation of these quantities are summarized in the fol- 
lowing theorem. 
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THEOREM 5.12 Let Y;, Yo,..., Y, and Xj, X5, ..., Xm be random variables with E(Y;) = ш; 
and E(X;) = &;. Define 


к=. апа VS 
j=1 


i=l 
for constants a1, a2, . . . , à, and by, bo, ..., bm. Then the following hold: 

a JE) = DID 

b VD = Dar VC t 2E Ж c OEN Y.) where the 


double sum is over all pairs (i, j) withi < j. 


с Cov(U,, U2) = yY Da ajb;Cov(Y;, XG): 


Before proceeding with the proof of Theorem 5.12, we illustrate the use of the 
theorem with an example. 


EXAMPLE 5.25 Let Yi, Y), and Y; be random variables, where Е(Ү) = 1, E(Y2) = 2, E(¥3) = —1, 
Ү(Ү,) = 1, V(Yz) = 3, V(Y3) = 5, Cov(Yi, Y?) = —0.4, Cov(Y,, Үз) = 1/2, and 
Cov(Y2, Уз) = 2. Find the expected value and variance of U = Y, — 2Y2 + Ys. If 
W = Зү + Y», find Cov(U, W). 
Solution U = а ,Y,+a2¥2+a3Y3, where a; = 1,a5 = —2, and a3 = 1. Then by Theorem 5.12, 
E(U) = a,E(Y)) + a E) + a3 Е(Үз) = (D) + CDO) + (0)(71) = —4. 
Similarly, 
V(U) = a? V (Y1) + aV (Y2) + a$V (Y3) + 2ауа›Соу(У\, Y2) 
+ 2a,a3Cov(Y,, Үз) + 2a2a3Cov(Yo, Үз) 
= (1)2(1) + (—2)?(3) + 0*6) + (2)(1)(—2)(-0.4) 
+ (2)(1)(1)(1/2) + (2)(—2)(1)02) 
= 12.6. 
Notice that W = bı Yı + b; Y», where ру = 3 and b; = 1. Thus, 
Cov(U, W) = ajb4Cov(Y,, Y1) + aib2Cov(Y1, Y2) + аЬ Соу(У›, Yi) 


+ ab2Cov (Y2, Y2) + a3b,Cov(¥3, Ү,) + азЬ» Cov(¥3, Y>). 
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Notice that, as established in Exercise 5.96, Cov(Y;, Y;) = Cov(Y;, Y;) and Cov(Y;, 
Y;) = V(Yj). Therefore, 
Cov(U, W) = (1)(3)(1) + (1)(1)(—0.4) + (—2) (3) (—0.4) 
+ (—2)(1)(3) + (1)(3)(1/2) + (1)(1)0) 
= 29: 


Because Cov(U, W) + 0, it follows that U and W are dependent. О 


We now proceed with the proof of Theorem 5.12. 


Proof The theorem consists of three parts, of which (a) follows directly from Theo- 
rems 5.7 and 5.8. To prove (b), we appeal to the definition of variance and write 


V(Ui) = E[U; — E(U)f = Р - Уи 
i=l iSl 


" 2 
= Р; = | 
icu 


n n 
n 


= E| ala — ui)? + 22 «aj m; — и) 
= E 
n n 


= Y REO, - uy + 2.2 aajE[05 — nuQ — nj]. 
к Bs 


By the definitions of variance and covariance, we have 


n n 


v= avant Lu, 


5 =] i= 
= TES 
Because Cov(Y;, Y;) = Cov(Y;, Y;), we can write 


aja; Cov(Y;, Yj). 
1 


V(U) = Y avo + 2Y So aa Соу(Ү;, Y;). 
pm 


1<і<ј<п 
Similar steps can be used to obtain (с). We have 
Cov(U;, U2) = E([U; — E(U\)] [Uo — E(U2)]} 


i=l == j=l j=l 
= el [> gi = | È b;(Xj — Zl 
i=l j=l 
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= рзуз X aibj – ux; – J 
-YYs Е = ЫХ d] 


= S p» Cov(Y;, Xj). 
2 


On observing that Cov(Y;, Ү;) = V(Y;), we can see that (b) is a special case 
of (c). 


EXAMPLE 5.26 


Solution 


Refer to Examples 5.4 and 5.20. In Example 5.20, we were interested in Y; — Y», the 
proportional amount of gasoline remaining at the end of a week. Find the variance of 
Y; — Yo. 
Using Theorem 5.12, we have 

Vi — Ў) = V(Y1) + V(Y?) - 2 Cov(Yi, Y2). 


Because 
3yt, 0 = yı = 1; 
dide v 
0, elsewhere, 
and 
Em —y3), O<y <1, 
f) = 
0, elsewhere, 
it follows that 
1 
3 
go = | 3yr dy, = =, 
0 5 
1 
3 3 2 3[1 1 1 
Е(Ү5) = —y5(1— d = SS | SS 
(Y5) | Д y2) dy aE q 5 


From Example 5.20, we have Е(Ү,) = 3/4 and E(Y2) = 3/8. Thus, 
Ү(Ү,) = (3/5) — (3/4? = .04 and У(У›) = (1/5) — (3/8? = .06. 
In Example 5.22, we determined that Cov(Y;, Y2) = .02. Therefore, 
V(% — Y?) = V(Y)) + V Y2) — 2 Cov(Yi, Y2) 
= .04 + .06 — 2(.02) = 
The standard deviation of Y; — Y» is then V.06 = .245. E 
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EXAMPLE 5.27 


Solution 


Let Y;, Y2,..., Y, be independent random variables with E(Y;) = wand V(Y;) = о?. 
(These variables may denote the outcomes of п independent trials of an experiment.) 
Define 


= 1 п 
Y=- Y; 
2 
and show that E(Y) = и and V(Y) = o? [n. 


Notice that Y is a linear function of Y;, Y,..., Y, with all constants a; equal to 1/n. 


That is, 
- 1 1 
Ү=[([—|У + ..+(—)})У,. 
п п 


п п 


п п 
EY) = Уаш = ausu} a =u% += £ =u. 
i-l i=1 


i=l і=1 


By Theorem 5.12(a), 


By Theorem 5.12(b), 


п п 


уб) =} а!у(ку+22?5; 


j=l i= 
i<j 


aja; Cov(Y;, Y;). 
1 


i=l 
The covariance terms all are zero because the random variables are independent. Thus, 


va» E fiy М n fiy? 7 І no? о? 
п ; : n n? = п? п E 


i=l 


EXAMPLE 5.28 


Solution 


The number of defectives Y in a sample of n = 10 items selected from a manufactur- 
ing process follows a binomial probability distribution. An estimator of the fraction 
defective in the lot is the random variable р = Y/n. Find the expected value and 
variance of р. 


The term р is a linear function of a single random variable Y, where p = aY and 
a, = 1/n. Then by Theorem 5.12, 


1 
E(p) = a EY) = - EQ). 


The expected value and variance of a binomial random variable are np and npq, 
respectively. Substituting for E(Y), we obtain 


1 
E(p) = —(np) = p. 
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Thus, the expected value of the number of defectives Y, divided by the sample size, 
is p. Similarly 


1 2 
уу = ( ) npq 


L sH 
n n` 


EXAMPLE 5.29 


Solution 


Suppose that an urn contains r red balls and (N — r) black balls. A random 
sample of n balls is drawn without replacement and Y, the number of red balls 
in the sample, is observed. From Chapter 3 we know that Y has a hypergeometric 
probability distribution. Find the mean and variance of Y. 


We will first observe some characteristics of sampling without replacement. Suppose 
that the sampling is done sequentially and we observe outcomes for Х|, X2,..., Xn, 
where 


| 1, if the ith draw results in a red ball, 


0, otherwise. 


Unquestionably, P(X; = 1) = r/N. But it is also true that P(X2 = 1) = r/N 
because 


Р(Х» =1)= P(X, = 1, X2 = 1) + P(X, = 0, X2 = 1) 
= P(X, = I)P(X5 = 1X1 = 1) + P(X = 0)P(X2 = 1|X; = 0) 
= (=) r—i 4 N -r r UND r 
~\N/I\N-1 N N—-1/7 N(N-1) N 


The same is true for X+; that is, 


7 
P(X; = 1) = —, kml. XS. 
(Xk ) N n 


Thus, the (unconditional) probability of drawing a red ball on any draw is r/ М. 
In a similar way it can be shown that 
r(r — 1) 


т 


Ј % К. 
Now, observe that Y = » 7 , X;, and, hence, 
E(Y) = y E(X) = y (=) = (—) | 
i=l imi SN N 


To find V (Y) we need V(X;) апа Cov(X;, X;). Because X; is 1 with probability 
r/N and 0 with probability 1 — (r/N), it follows that 


va) = s (1- x): 
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Also, 


r(r — 1) r2 
Cov(X;, Xj) = Е(Х,Ху) - Е(Х)УЕ(Ху) = = - (=) 


N(N - 1) 
= ад) 3) 4) 


because X; X; = 1 if and only if X; = 1 and X; = 1 and X; X; = 0 otherwise. From 
Theorem 5.12, we know that 


Ү(У) = 3 V(Xi) + 25` у, Cov(X;, Х;) 
izi 


i<j 


ко Шор у (65) 
e A (уч) 


because the double summation contains n (n — 1)/2 equal terms. A little algebra yields 


vona (E (1-2) i) = 


To appreciate the usefulness of Theorem 5.12, notice that ће derivations contained 
in Example 5.29 are much simpler than those outlined in Exercise 3.216, where 
the mean and variance were derived by using the probabilities associated with the 
hypergeometric distribution. 


Exercises 


5.102 А firm purchases two types of industrial chemicals. Type I chemical costs $3 per gallon, whereas 
type II costs $5 per gallon. The mean and variance for the number of gallons of type I chemical 
purchased, Y;, are 40 and 4, respectively. The amount of type II chemical purchased, Y2, has 
Е(Ү) = 65 gallons and V (У) = 8. Assume that Y; and Y; are independent and find the mean 
and variance of the total amount of money spent per week on the two chemicals. 


5.103 Assume that Y;, Y2, and У; are random variables, with 


E(Yi) = 2, E(%)) = —1, E(Y3) = 4, 
V(Yi)) = 4, Ү(№) = 6, V(¥3) = 8, 
Cov(Y¥,, У) = 1, Cov(Y,, Y3) = —1, Cov(Y5, Yi) = 0. 


Find EGY, + 4Y, — 6Y3) and Ү(ЗҮ, + 4Y, = 6Y3). 


5.104 In Exercise 5.3, we determined that the joint probability distribution of Y;, the number of 
married executives, and У, the number of never-married executives, is given by 


ОЈ) 
Ө 


pu у) = 


5.105 


5.106 


5.107 


5.108 


5.109 


5.110 
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where y; and уз are integers, 0 < y < 3,0 < y2 <3,and1 < у + у <3. 


a Find E(Y; + Y2) and V (Y; + Y2) by first finding the probability distribution of Y, + Y2. 

b In Exercise 5.90, we determined that Cov(Y;, Y) = -—1/3. Find E(Y, + Y2) and 
V(Y, + Y2) by using Theorem 5.12. 

In Exercise 5.8, we established that 

4Ayy», 0< у < 1,05 у <l, 

0, elsewhere 


ЈО, y2) = | 


is a valid joint probability density function. In Exercise 5.52, we established that Y; and У, are 
independent; in Exercise 5.76, we determined that E(Y; — Y2) = 0 and found the value for 
V (Y;). Find V(Y, — Y2). 

In Exercise 5.9, we determined that 

6(1—y), 0<у < у < 1, 
Оп, 92) = 
HE 0, elsewhere 
is a valid joint probability density function. In Exercise 5.76, we derived the fact that 


Е(Ү —3Y2) = —5/4; in Exercise 5.92, we proved that Cov(Y;, Y2) = 1/40. Find V (Yı —3Y2). 


In Exercise 5.12, we were given the following joint probability density function for the random 
variables Y; and Y2, which were the proportions of two components in a sample from a mixture 
of insecticide: 


2, O<y<10<y<10<y+y <1, 


0, elsewhere. 


fou | 


For the two chemicals under consideration, an important quantity is the total proportion Ү + Y; 
found in any sample. Find E(Y,; + Y2) and V (Y; + Y2). 


If Y; is the total time between a customer's arrival in the store and departure from the service 
window and if Y; is the time spent in line before reaching the window, the joint density of these 
variables was given in Exercise 5.15 to be 


e", 0< уху < оо, 
fou у) = 

0, elsewhere. 
The random variable Y, — Y, represents the time spent at the service window. Find E (Y; — Y2) 
and V (Y, — Y2). Is it highly likely that a randomly selected customer would spend more than 
4 minutes at the service window? 


In Exercise 5.16, Y; and Y; denoted the proportions of time that employees I and П actually 
spent working on their assigned tasks during a workday. The joint density of Yı and У» is 
given by 


K ) Xx»t» 0<у<1,0<у,<1, 

1,32) = 

* 0, elsewhere. 

In Exercise 5.80, we derived the mean of the productivity measure 30Y, + 25Y>. Find the vari- 
ance of this measure of productivity. Give an interval in which you think the total produc- 
tivity measures of the two employees should lie for at least 75% of the days in question. 


Suppose that Y, and Y, have correlation coefficient p = .2. What is is the value of the correlation 
coefficient between 
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5.111 


5.112 


5.113 


5.114 


5.115 


5.116 
*5.117 


a l-2Y, and 3 + 4Y? 
b 1+2Y, and3 — 4Y5? 
с 1—2Y; and 3 — 4Y?? 
A retail grocery merchant figures that her daily gain X from sales is a normally distributed 
random variable with и = 50 апас = 3 (measurements in dollars). X can be negative if she is 
forced to dispose of enough perishable goods. Also, she figures daily overhead costs Y to have 
a gamma distribution with о = 4 and В = 2. If X and Y are independent, find the expected 


value and variance of her net daily gain. Would you expect her net gain for tomorrow to rise 
above $70? 


In Exercise 5.18, Y; and У, denoted the lengths of life, in hundreds of hours, for components 
of types I and II, respectively, in an electronic system. The joint density of Y; and Y is 


i (1/8)yie O92, yp > 0, у > 0, 
Xp yx = 


| elsewhere. 
The cost C of replacing the two components depends upon their length of life at failure and is 
given by С = 50+ 2Y, + 4Y>. Find E(C) and V (C). 


Suppose that Y, and Y; have correlation coefficient py, y, and for constants a, b, c and d let 

Wi =a+by, and № = c + а}. 

а Show that the correlation coefficient between W, апа W2, руу, у, is such that | py, y,| = 
[ЖАР 


b Does this result explain the results that you obtained in Exercise 5.110? 


For the daily output of an industrial operation, let Y; denote the amount of sales and Y3, the 
costs, in thousands of dollars. Assume that the density functions for Y; and Y; are given by 


(1/6yje7^, у > 0, der", yy 5-0, 
AQ) = | and А02) = 
0, yı < 0, 0, y» < 0. 
The daily profit is given by U = Y, — Y3. 
a Find E(U). 
b Assuming that Y; and Y; are independent, find V (U). 
c Would you expect the daily profit to drop below zero very often? Why? 


Refer to Exercise 5.88. If Y denotes the number of tosses of the die until you observe each of 
the six faces, Y = Y, + Y; + Y; + Y, + Ys + Yo where Y, is the trial on which the first face 


is tossed, Y; = 1, Y? is the number of additional tosses required to get a face different than 
the first, У; is the number of additional tosses required to get a face different than the first two 
distinct faces, ..., Ye is the number of additional tosses to get the last remaining face after all 


other faces have been observed. 


a Show that Cov(Y;, У;) = 0, i, j =1,2,...,6,1 Æ j. 
b Use Theorem 5.12 to find V(Y). 
c Give an interval that will contain Y with probability at least 3/4. 


Refer to Exercise 5.75. Use Theorem 5.12 to explain why V (Y; + Y2) = V (Y, — Y2). 


A population of N alligators is to be sampled in order to obtain an approximate measure of 
the difference between the proportions of sexually mature males and sexually mature females. 
Obviously, this parameter has important implications for the future of the population. Assume 
that n animals are to be sampled without replacement. Let Y; denote the number of mature 


5.118 


5.9 


DEFINITION 5.11 
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females and Y; the number of mature males in the sample. If the population contains proportions 
pı and p» of mature females and males, respectively (with p; + p2 < 1), find expressions for 


The total sustained load on the concrete footing of a planned building is the sum of the dead 
load plus the occupancy load. Suppose that the dead load X, has a gamma distribution with 
о = 50 and f, = 2, whereas the occupancy load X; has a gamma distribution with оз = 20 
and f» = 2. (Units are in kips.) Assume that X, and X, are independent. 


a Find the mean and variance of the total sustained load on the footing. 


b Finda value for the sustained load that will be exceeded with probability less than 1/16. 


The Multinomial Probability Distribution 


Recall from Chapter 3 that a binomial random variable results from an experiment 
consisting of n trials with two possible outcomes per trial. Frequently we encounter 
similar situations in which the number of possible outcomes per trial is more than 
two. For example, experiments that involve blood typing typically have at least four 
possible outcomes per trial. Experiments that involve sampling for defectives may 
categorize the type of defects observed into more than two classes. 

A multinomial experiment is a generalization of the binomial experiment. 


A multinomial experiment possesses the following properties: 


1. The experiment consists of n identical trials. 

. The outcome of each trial falls into one of k classes or cells. 

3. The probability that the outcome of a single trial falls into cell i, is pi, 
i = 1,2,...,k and remains the same from trial to trial. Notice that 
Pit ptp t= е чь ру = le 

4. The trials are independent. 

5. The random variables of interest are Y;, Y2,..., Yg, where Y; equals 
the number of trials for which the outcome falls into cell ;. Notice that 
Wap Nap з че во че Vie = 10, 


The joint probability function for Yi, Y2,..., Y, is given by 
pv y»... Ур) = ШШЕ un p? ‚2... р? 
~~ Уу k? 


where 


k 
5 p=! and X ysn. 
i i=l 


Finding the probability that the n trials in a multinomial experiment result in 
(Yi = у, Yo = y2,..., Yk = yk) is an excellent application of the probabilistic 
methods of Chapter 2. We leave this problem as an exercise. 
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DEFINITION 5.12 


Assume that pi, p2,..., рк are such that Nm jjj = i, anal jp > © itor 


i=1,2,...,k. The random variables Y;, Y2,..., Yg, are said to have a multi- 
nomial distribution with parameters n and pi, p2, ... , px if the joint probability 
function of Y;, Y2,..., Y, is given by 
- n! Уу Ye 
РО ОЕ от оо азо 
Y1:y25000 Yt 


where, for each i, у; = 0, 1, 2, ..., n and Ne et 3b = To 


Many experiments involving classification are multinomial experiments. For ex- 
ample, classifying people into five income brackets results in an enumeration or count 
corresponding to each of five income classes. Or we might be interested in studying 
the reaction of mice to a particular stimulus in a psychological experiment. If the mice 
can react in one of three ways when the stimulus is applied, the experiment yields 
the number of mice falling into each reaction class. Similarly, a traffic study might 
require a count and classification of the types of motor vehicles using a section of 
highway. An industrial process might manufacture items that fall into one of three 
quality classes: acceptable, seconds, and rejects. A student of the arts might classify 
paintings into one of k categories according to style and period, or we might wish 
to classify philosophical ideas of authors in a study of literature. The result of an 
advertising campaign might yield count data indicating a classification of consumer 
reactions. Many observations in the physical sciences are not amenable to measure- 
ment on a continuous scale and hence result in enumerative data that correspond to 
the numbers of observations falling into various classes. 

Notice that the binomial experiment is a special case of the multinomial experiment 
(when there are k = 2 classes). 


EXAMPLE 5.30 


Solution 


According to recent census figures, the proportions of adults (persons over 18 years 
of age) in the United States associated with five age categories are as given in the 
following table. 


Age Proportion 


18-24 18 
25-34 23 
35-44 16 
45-64 27 

651 16 


If these figures are accurate and five adults are randomly sampled, find the probability 
that the sample contains one person between the ages of 18 and 24, two between the 
ages of 25 and 34, and two between the ages of 45 and 64. 


We will number the five age classes 1, 2, 3, 4, and 5 from top to bottom and will 
assume that the proportions given are the probabilities associated with each of the 
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classes. Then we wish to find 


( у= at jpn? p% p» 
dde = ana LLL 


for n = 5 and yı = 1, у = 2, уз = 0, уд = 2, and ys = 0. Substituting these values 
into the formula for the joint probability function, we obtain 


5! 
pQ; 2,0; 2, Оу = ПЕШТЕП 18)! (.23)?(.16)°(.27)?(.16)° 


= 30(.18)(.23)?(.27)* = .0208. п 


ТНЕОВЕМ 5.13 If Yi, Yo,..., Y; have a multinomial distribution with parameters n and рі, 
[Daly DES then 
ПЕ Ор WOE) e прщ;. 
2. Cov(Y;, ¥;)=—npsp:, ifs 51. 


Proof The marginal distribution of Y; can be used to derive the mean and variance. 
Recall that Y; may be interpreted as the number of trials falling into cell i. 
Imagine all of the cells, excluding cell i, combined into a single large cell. Then 
every trial will result in cell i or in a cell other than cell i, with probabilities 
pi and 1 — pj, respectively. Thus, Y; possesses a binomial marginal probability 
distribution. Consequently, 


E(Y; = np; and V(Y;)-—npiqi, where q; = 1 — pi. 


The same results can be obtained by setting up the expectations and evaluating. 
For example, 


EG) NM e лл ue 
y» № Yk Ут у 
Because we have already derived the expected value and variance of Y;, we 
leave the summation of this expectation to the interested reader. 
The proof of part 2 uses Theorem 5.12. Think of the multinomial experiment 
as a sequence of n independent trials and define, for s + t, 
= | 1, if trial i results in class s, 
0, otherwise, 
and 
We | 1, if trial i results in class t, 
0, otherwise. 
Then 


у= x and у= yw, 
j=l 


i=l 
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(Because U; = 1 or 0 depending upon whether the ith trial resulted in class s, 
Y, is simply the sum of a series of Os and 15. А 1 occurs in the sum everytime 
we observe an item from class s, and a 0 occurs everytime we observe any other 
class. Thus, Y, is simply the number of times class s is observed. A similar 
interpretation applies to Y;.) 

Notice that U; and W; cannot both equal 1 (the ith item cannot simultane- 
ously be in classes s and г). Thus, the product U; У; always equals zero, and 
E(U; W;) = 0. The following results allow us to evaluate Cov(Y;, Ү,): 


E(Ui) = ps 
E(W;) = pi 
Cowl Un V) em о, if i Z j because the trials are independent 
Cov(U;, W;) = E(U;W;) — E(Uj)) E(W;) = 0 — ps pı 
From Theorem 5.12, we then have 


Cov(Y,, Y;) = ps X Cov(U;, W;) 


i=1 j=l 


= Усок, т) + Cov, wj) 
=i izj 


= Cpp) + 0--npp. 
i=l 14] 


The covariance here is negative, which is to be expected because a large number 
of outcomes in cell s would force the number in cell ¢ to be small. 


Inferential problems associated with the multinomial experiment will be dis- 
cussed later. 


Exercises 


A learning experiment requires a rat to run a maze (a network of pathways) until it locates one 
of three possible exits. Exit 1 presents a reward of food, but exits 2 and 3 do not. (If the rat 
eventually selects exit 1 almost every time, learning may have taken place.) Let Y; denote the 
number of times exit i is chosen in successive runnings. For the following, assume that the rat 
chooses an exit at random on each run. 


Find the probability that n = 6 runs result in Ү = 3, Y; = 1, and У; = 2. 
For general n, find E(Y;) and V (Y;). 
Find Cov(Y5, Үз) for general n. 


оо C $9 


To check for the rat's preference between exits 2 and 3, we may look at Y; — Y3. Find 
E(Y5 — Үз) and V (Y; — Y3) for general n. 


A sample of size п is selected from a large lot of items in which a proportion p; contains exactly 
one defect and a proportion p» contains more than one defect (with p, + p» < 1). The cost of 
repairing the defective items in the sample is С = Y; + ЗУ, where Y, denotes the number of 


5.121 


5.122 


5.123 


5.124 


5.125 


5.126 


5.127 
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items with one defect and Y, denotes the number with two or more defects. Find the expected 
value and variance of C. 


Refer to Exercise 5.117. Suppose that the number N of alligators in the population is very 
large, with р = .3 and p; = .1. 


a Find the probability that, in a sample of five alligators, Y; — 2 and Y; — 1. 
Y Y. Y Y. 

b Itn = 5,find E (7. - *) and v (7. - 2), 
п 


The weights of a population of mice fed оп a certain diet since birth аге assumed to be normally 
distributed with и = 100 and ø = 20 (measurement in grams). Suppose that a random sample 
of n — 4 mice is taken from this population. Find the probability that 


a exactly two weigh between 80 and 100 grams and exactly one weighs more than 100 grams. 
b allfour mice weigh more than 100 grams. 


The National Fire Incident Reporting Service stated that, among residential fires, 7396 are in 
family homes, 20% are in apartments, and 7% are in other types of dwellings. If four residential 
fires are independently reported on a single day, what is the probability that two are in family 
homes, one is in an apartment, and one is in another type of dwelling? 


The typical cost of damages caused by a fire in a family home is $20,000. Comparable costs 
for an apartment fire and for fire in other dwelling types are $10,000 and $2000, respectively. 
If four fires are independently reported, use the information in Exercise 5.123 to find the 


a expected total damage cost. 


b variance of the total damage cost. 


When commercial aircraft are inspected, wing cracks are reported as nonexistent, detectable, 
or critical. The history of a particular fleet indicates that 70% of the planes inspected have no 
wing cracks, 25% have detectable wing cracks, and 5% have critical wing cracks. Five planes 
are randomly selected. Find the probability that 


а one has a critical crack, two have detectable cracks, and two have no cracks. 


b atleast one plane has critical cracks. 


A large lot of manufactured items contains 1096 with exactly one defect, 5% with more than 
one defect, and the remainder with no defects. Ten items are randomly selected from this lot 
for sale. If Y, denotes the number of items with one defect and Y>, the number with more than 
one defect, the repair costs are Ү + 3Y. Find the mean and variance of the repair costs. 


Refer to Exercise 5.126. Let Y denote the number of items among the ten that contain at least 
one defect. Find the probability that Y 


a equals 2. 
b isatleast 1. 


The Bivariate Normal 
Distribution (Optional) 


No discussion of multivariate probability distributions would be complete without 
reference to the multivariate normal distribution, which is a keystone of much modern 
statistical theory. In general, the multivariate normal density function is defined for 
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*5.129 


*5.130 


k continuous random variables, Y;, Y2,..., Үк. Because of its complexity, we will 
present only the bivariate density function (k — 2): 
e- 9n 
fOr, y2) = ; со < ур < 00, –00 < y; <M, 
2ло\о»/ 1 — р? 
where 
1 Гот =)? (у — ш) = имә) , O2 — шо)? 
ge 1—02 2 2p + 2 
—p Oi 0102 05 


The bivariate normal density function is a function of five parameters: u1, 142, a 
а, and p. The choice of notation employed for these parameters is not coincidental. 
In Exercise 5.128, you will show that the marginal distributions of Ү and Y, are 
normal distributions with means jz; and u2 and variances о? апа pos respectively. 
With a bit of somewhat tedious integration, we can show that Cov(Y;, Y2) = роо». 

If Cov(Y;, Y?) = 0—or, equivalently, if o = 0—then 


fF Ov y2) = gyphG»), 


where е (у) is a nonnegative function of y; alone and Л (уз) is a nonnegative function 
of y» alone. Therefore, if о = 0, Theorem 5.5 implies that Y, and Y, are indepen- 
dent. Recall that zero covariance for two random variables does not generally imply 
independence. However, if Y; and Y, have a bivariate normal distribution, they are 
independent if and only if their covariance is zero. 

The expression for the joint density function, k > 2, is most easily expressed by 
using the matrix algebra. A discussion of the general case can be found in the refer- 
ences at the end of this chapter. 


Exercises 


Let Y, and У, have a bivariate normal distribution. 


a Show that the marginal distribution of Y; is normal with mean jz; and variance og. 


b What is the marginal distribution of У? 


Let Y, and Y; have a bivariate normal distribution. Show that the conditional distribution of 

Y, given that Y) = у» is a normal distribution with mean ш + py — H2) and variance 
о? 

oj — p?). 


Let У, Y2, ..., Y, be independent random variables with E(Y;) = и and V(Y;) = o? for 


i=1,2,...,n. Let 
=) шї, and 0% = УЬУ, 
i=l і=1 


where а, a2, ..., an, and bj, b2, ..., b, are constants. U, апа U, are said to be orthogonal if 
Cov(U;, U2) =0. 


a Show that U; and (7 are orthogonal if and only if Y^; , a:b; = 0. 


b Suppose, in addition, that Y,, Y2,..., Y, have a multivariate normal distribution. Then U; 
and U} have a bivariate normal distribution. Show that U, and U, are independent if they 
are orthogonal. 
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Let Y, and Y, be independent normally distributed random variables with means 44, and u2, 


respectively, and variances o? = o2 = o?. 


a Show that Y; and Y; have a bivariate normal distribution with p = 0. 


b Consider U, = Y, + Y; and U, = Y, — Y;. Use the result in Exercise 5.130 to show that 
U, and U, have a bivariate normal distribution and that U, and U, are independent. 


Refer to Exercise 5.131. What are the marginal distributions of U; and Uz? 


Conditional Expectations 


Section 5.3 contains a discussion of conditional probability functions and conditional 
density functions, which we will now relate to conditional expectations. Conditional 
expectations are defined in the same manner as univariate expectations except that 
conditional densities and probability functions are used in place of their marginal 
counterparts. 


If Y; and У are any two random variables, the conditional expectation of g (Y;), 
given that Yo — y», is defined to be 


оо 


E(g(Y1) | Y = у) = | gO) f O1 | уз) ду, 


—00 
if Yı and У are jointly continuous and 
E(g(¥1) =y) = 3 в(у)рО | уз) 
all yı 


if Y, and Y are jointly discrete. 


EXAMPLE 5.31 


Solution 


Refer to the random variables Y, and Y; of Example 5.8, where the joint density 
function is given by 


1/2, 0< у= у <2, 
Fons) =| 


0, elsewhere. 


Find the conditional expectation of the amount of sales, Y;, given that Y) = 1.5. 


In Example 5.8, we found that, if 0 < у < 2, 


Fon lye) ta О<у < у, 
PE 0, elsewhere. 

Thus, from Definition 5.13, for any value of y» such that 0 < y2 < 2, 

oo 


Eae des | A G4 


7 1 1 {у |? yo 
(o7 zGL)-2 
0 » y2 X2 10 2 
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Because we are interested in the value y? = 1.5, it follows that E(Y; | Y = 1.5) = 
1.5/2 = 0.75. That is, if the soft-drink machine contains 1.5 gallons at the start of 
the day, the expected amount to be sold that day is 0.75 gallon. О 


ТНЕОКЕМ 5.14 


Proof 


EXAMPLE 5.32 


Solution 


In general, the conditional expectation of Y; given Ү = y» is a function of y». If 
we now let Y, range over all of its possible values, we can think of the conditional 
expectation E(Y; | Y2) as a function of the random variable У. In Example 5.31, 
we obtained E(Y; | Yo = y2) = y2/2. It follows that E(Y; | Y2) = Y2/2. Because 
E(Y, | Y2) is a function of the random variable У, it is itself a random variable; and 
as such, it has a mean and a variance. We consider the mean of this random variable 
in Theorem 5.14 and the variance in Theorem 5.15. 


Let Y; and Y? denote random variables. Then 
E(Yi) = ELE | Y2)]. 


where on the right-hand side the inside expectation is with respect to the con- 
ditional distribution of Y; given У and the outside expectation is with respect 
to the distribution of Y>. 


Suppose that Y; and Y, are jointly continuous with joint density function 
f (1, y2) and marginal densities ў (ут) and f2(y2), respectively. Then 


ga = f ‘| yi f Y1, y2) dy, dy; 


= | | yif On | y) РО) dy; dy; 


=| [| yi fn bod. f2»2) dy; 


оо 


= | E(% | Y = yo) fo(y2) dy; = E [E (Y; | Y2)]. 


The proof is similar for the discrete case. 


A quality control plan for an assembly line involves sampling n = 10 finished items 
per day and counting Y, the number of defectives. If p denotes the probability of 
observing a defective, then Y has a binomial distribution, assuming that a large number 
of items are produced by the line. But p varies from day to day and is assumed to have 
a uniform distribution on the interval from 0 to 1/4. Find the expected value of Y. 


From Theorem 5.14, we know that E(Y) = E[E(Y|p)]. For a given p, Y has a 
binomial distribution, and hence E(Y|p) = np. Thus, 


1/4—0 п 
E(Y) = Е[Е(Ү|р)] = E(np) = пЕ(р) = n( 5 ) =" 
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THEOREM 5.15 


Proof 


and forn = 10 
E(Y) = 10/8 = 1.25. 
In the long run, this inspection policy will average 1.25 defectives per day. 
The conditional variance of Y; given Y) = у» is defined by analogy with an 


ordinary variance, again using the conditional density or probability function of 
Yı given Y? = у» in place of the ordinary density or probability function of Y;. 
That is, 


V(Yi| Yo = у) = E(Y?2| Y = y) - [EY | Yo = у). 


As in the case of the conditional mean, the conditional variance is a function of y». 
Letting У range over all of its possible values, we can define V (Y; | У) as a random 
variable that is a function of Y2. Specifically, if g(y2) = V (Yı | Y2 = y2) isa particular 
function of the observed value, y», then g(Y?) = V (Y; | Y2) is the same function of 
the random variable, У. The expected value of V (Y; | Y2) is useful in computing the 
variance of Y;, as detailed in Theorem 5.15. 


Let Ү and У denote random variables. Then 

VQ) = E[V O | y] + VIE | Y;)]. 
As previously indicated, V (Y; | Y2) is given by 

VY) = EY? | Y2) - [EY | YD] 
and 

Е[У (| |Ү›)] = Е[Е(Ү?|у›)] - E([EQ | Y] ]- 
By definition, 
у[Е0 Yd] = Е [pen vo] - tepeo 1 v2]. 
The variance of Y, is 
VQ) = E[Y7] - [evo 

yx] - tego »]r 
yix] - gea vof] + e(peo top] 
Е[Е(Ү | Y] F 
= Е|У(Ү, | v2] + v [EQ | ¥%)]. 


= ЕЕ 
= Ж 
| 
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EXAMPLE 5.33 


Solution 


Refer to Example 5.32. Find the variance of У. 


From Theorem 5.15 we know that 
VO) = E[VQ1 | Y)] + VIE | v2)]. 


Fora given p, Y has a binomial distribution, and hence E(Y | p) = прапа V (Y | p) = 
npq. Thus, 


VY) = E[V( | p)] + У[Е(Ү | p)] 
= E(npq) + V (np) = nE [p(l — p)] + i? V (p). 


Because p is uniformly distributed on the interval (0, 1/4) and E (р?) = Ү(р) + 
[Е(р)], it follows that 


gis vp) Q/A-9* _ 1 See 1 PE c sd 
g M= шт, Ро 192" 64 48 
Thus, 
V(Y) =пЕ[р(1— p] - п?У(р) = n [E(p) – Е(р?)| + 1? V (p) 

_ 1 1 К 2 1 _ 5п $ n? 

"is 48) " \192/ 48 192 
and for n — 10, 

V(Y) = 50/48 + 100/192 = 1.5625. 

Thus, the standard deviation of Y iso = 4/1.5625 = 1.25. ш 


The mean and variance of Y calculated in Examples 5.32 апа 5.33 could be checked 
by finding the unconditional probability function of Y and computing E(Y) and V (Y) 
directly. In doing so, we would need to find the joint distribution of Y and p. From 
this joint distribution, the marginal probability function of Y can be obtained and 
E(Y) determined by evaluating У, yp(y). The variance can be determined in the 
usual manner, again using the marginal probability function of Y. In Examples 5.32 
and 5.33, we avoided working directly with these joint and marginal distributions. 
Theorems 5.14 and 5.15 permitted a much quicker calculation of the desired mean 
and variance. As always, the mean and variance of a random variable can be used 
with Tchebysheff's theorem to provide bounds for probabilities when the distribution 
of the variable is unknown or difficult to derive. 

In Examples 5.32 and 5.33, we encountered a situation where the distribution 
of a random variable (Y — the number of defectives) was given conditionally for 
possible values of a quantity p that could vary from day to day. The fact that p varied 
was accommodated by assigning a probability distribution to this variable. This is 
an example of a hierarchical model. In such models, the distribution of a variable of 
interest, say, Y, is given, conditional on the value of a “parameter” 6. Uncertainty 
about the actual value of 0 is modeled by assigning a probability distribution to it. 
Once we specify the conditional distribution of Y given 0 and the marginal distribution 
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of 0, the joint distribution of Y and 0 is obtained by multiplying the conditional by the 
marginal. The marginal distribution of Y is then obtained from the joint distribution 
by integrating or summing over the possible values of 0. The results of this section 
can be used to find E(Y) and V (Y) without finding this marginal distribution. Other 
examples of hierarchical models are contained in Exercises 5.136, 5.138, 5.141 and 
5.142. 


Exercises 


In Exercise 5.9, we determined that 


6(1—y), 0<у х= у <1, 


0, elsewhere 


РО, у) = 


is a valid joint probability density function. 


a Find Е(Ү,|Ү» = у»). 
b Use the answer derived in part (a) to find E(Y,). (Compare this with the answer found in 
Exercise 5.77.) 


In Examples 5.32 and 5.33, we determined that if Y is the number of defectives, E(Y) — 1.25 
and V (Y) = 1.5625. Is it likely that, on any given day, Y will exceed 6? 


In Exercise 5.41, we considered a quality control plan that calls for randomly selecting three 
items from the daily production (assumed large) of a certain machine and observing the number 
of defectives. The proportion p of defectives produced by the machine varies from day to day 
and has a uniform distribution on the interval (0, 1). Find the 


a expected number of defectives observed among the three sampled items. 


b variance of the number of defectives among the three sampled. 


In Exercise 5.42, the number of defects per yard in a certain fabric, Y, was known to have a 
Poisson distribution with parameter A. The parameter A was assumed to be a random variable 
with a density function given by 


e^, А> 0, 

А) = 

f 0, elsewhere. 

a Find the expected number of defects per yard by first finding the conditional expectation 
of Y for given A. 


b Find the variance of Y. 
Is it likely that Y exceeds 9? 


In Exercise 5.38, we assumed that Y;, the weight of a bulk item stocked by a supplier, had a 
uniform distribution over the interval (0, 1). The random variable Y, denoted the weight of the 
item sold and was assumed to have a uniform distribution over the interval (0, y,), where y, 
was a specific value of Y;. If the supplier stocked 3/4 ton, what amount could be expected to 
be sold during the week? 


Assume that Y denotes the number of bacteria per cubic centimeter in a particular liquid and 
that Y has a Poisson distribution with parameter A. Further assume that A varies from location 
to location and has a gamma distribution with parameters о and В, where о is a positive integer. 
If we randomly select a location, what is the 
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a expected number of bacteria per cubic centimeter? 


b standard deviation of the number of bacteria per cubic centimeter? 


Suppose that a company has determined that the the number of jobs per week, N, varies from 
week to week and has a Poisson distribution with mean А. The number of hours to complete 
each job, Y;, is gamma distributed with parameters о and £. The total time to complete all jobs 
in a week is T — Xs ; Уг. Note that Т is the sum of a random number of random variables. 
What is 


a Е(Т|М=п)? 
b E(T),the expected total time to complete all jobs? 
Why is E[V(Y,]Y7)] < V(Y1)? 
Let Y; have an exponential distribution with mean А and the conditional density of Y; given 
Yı = yı be 
Uy» OS Yes Vis 
) = 
fos» 0, elsewhere. 


Find E(Y;) and V (Y2), the unconditional mean and variance of Y». 


Suppose that Y has a binomial distribution with parameters п and p but that p varies from day 
to day according to a beta distribution with parameters o and В. Show that 


а E(Y) = по/(а + В). 

h Vey EE ан 

(o + B (a + B + 1) 

If Y, and Y; are independent random variables, each having a normal distribution with mean 0 
and variance 1, find the moment-generating function of U = Ү Y2. Use this moment-generating 
function to find E(U) and V (U). Check the result by evaluating E(U) and V (U) directly from 
the density functions for Y, and У». 


Summary 


The multinomial experiment (Section 5.9) and its associated multinomial probabil- 
ity distribution convey the theme of this chapter. Most experiments yield sample 
measurements, ут, y2,..., Yk, Which may be regarded as observations on k random 
variables. Inferences about the underlying structure that generates the observations— 
the probabilities of falling into cells 1, 2, ..., k—are based on knowledge of the 
probabilities associated with various samples (yi, y2,..., ук). Joint, marginal, and 
conditional distributions are essential concepts in finding the probabilities of various 
sample outcomes. 

Generally we draw from a population a sample of n observations, which are specific 
values of Y;, Y2,..., Y,. Many times the random variables are independent апа have 
the same probability distribution. As a consequence, the concept of independence is 
useful in finding the probability of observing the given sample. 

The objective of this chapter has been to convey the ideas contained in the two 
preceding paragraphs. The numerous details contained in the chapter are essential in 
providing a solid background for a study of inference. At the same time, you should 
be careful to avoid overemphasis on details; be sure to keep the broader inferential 
objectives in mind. 
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Supplementary Exercises 


Prove Theorem 5.9 when Y; and Y; are independent discrete random variables. 


A technician starts а job at a time Y, that is uniformly distributed between 8:00 A.M. and 
8:15 A.M. The amount of time to complete the job, Y2, is an independent random variable that 
is uniformly distributed between 20 and 30 minutes. What is the probability that the job will 
be completed before 8:30 A.M.? 


A target for a bomb is in the center of a circle with radius of 1 mile. A bomb falls at a randomly 
selected point inside that circle. If the bomb destroys everything within 1/2 mile of its landing 
point, what is the probability that the target is destroyed? 


Two friends are to meet at the library. Each independently and randomly selects an arrival time 
within the same one-hour period. Each agrees to wait a maximum of ten minutes for the other 
to arrive. What is the probability that they will meet? 


A committee of three people is to be randomly selected from a group containing four Repub- 
licans, three Democrats, and two independents. Let Y; and Y; denote numbers of Republicans 
and Democrats, respectively, on the committee. 


a What is the joint probability distribution for Y, and Y3? 
b Find the marginal distributions of Y; and У. 
c Find P(Y; = № > 1). 


Let Y; and Y, have a joint density function given by 


Зуџ, О < у < у <1, 


ХО, у) = 


0, elsewhere. 


Find the marginal density functions of Y, and У». 

Find P(Y; < 3/4|Y, < 1/2). 

Find the conditional density function of Y; given У = у». 
Find P(Y, < 3/4|¥ = 1/2). 


© с^ c» 
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Refer to Exercise 5.149. 


a Find E(Y;]Y; = yi). 
b Use Theorem 5.14 to find E(Y2). 
с Find E(Y2) directly from the marginal density of У. 


The lengths of life Y for a type of fuse has an exponential distribution with a density function 
given by 


(/B)e P, у> 0, 
ДО) = “a А 


Д elsewhere. 


a Іо such fuses have independent life lengths Y, and Y», find their joint probability density 
function. 


b One fuse from part (a) is in a primary system, and the other is in a backup system that 
comes into use only if the primary system fails. The total effective life length of the two 
fuses, therefore, is Ү + Y2. Find P(Y, + Y; < a), where a > 0. 


In the production of a certain type of copper, two types of copper powder (types A and B) are 
mixed together and sintered (heated) for a certain length of time. For a fixed volume of sintered 
copper, the producer measures the proportion Y; of the volume due to solid copper (some pores 
will have to be filled with air) and the proportion Y, of the solid mass due to type A crystals. 
Assume that appropriate probability densities for Y; and Y, are 


6у(01- у), O<y <1, 
fio) = 
3 elsewhere, 
3y3, 0<y <1, 
fO) = 
0, elsewhere. 


The proportion of the sample volume due to type A crystals is then Ү У. Assuming that Y, 
and Y, are independent, find P (Yı Y2 < .5). 


Suppose that the number of eggs laid by a certain insect has a Poisson distribution with mean 
à. The probability that any one egg hatches is p. Assume that the eggs hatch independently of 
one another. Find the 


a expected value of Y, the total number of eggs that hatch. 
b variance of Y. 


In a clinical study of a new drug formulated to reduce the effects of rheumatoid arthritis, 
researchers found that the proportion p of patients who respond favorably to the drug is a 
random variable that varies from batch to batch of the drug. Assume that p has a probability 
density function given by 


12p°(1— р), 0<р<1, 
ХО) = 
Д elsewhere. 
Suppose that n patients are injected with portions of the drug taken from the same batch. Let 
Y denote the number showing a favorable response. Find 


a the unconditional probability distribution of Y for general n. 
b E(Y)forn = 2. 
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Suppose that Y,, У, and У; are independent x?-distributed random variables with v, v», and 
уз degrees of freedom, respectively, and that W; = Y; + Y; and №, = Y, + Ys. 


a In Exercise 5.87, you derived the mean and variance of W;. Find Cov(Wi, W2). 


b Explain why you expected the answer to part (a) to be positive. 


Refer to Exercise 5.86. Suppose that Z is a standard normal random variable and that Y is an 
independent x? random variable with v degrees of freedom. 


a Define W = Z/A/Y. Find Cov(Z, W). What assumption do you need about the value of v? 
b With Z, Y, and W as above, find Cov(Y, W). 


c Oneofthe covariances from parts (a) and (b) is positive, and the other is zero. Explain why. 


A forester studying diseased pine trees models the number of diseased trees per acre, Y, as a 
Poisson random variable with mean А. However, А. changes from area to area, and its random 
behavior is modeled by a gamma distribution. That is, for some integer a, 
1 
fA = 1 F'(0* 


0, elsewhere. 


Ае. X x 0, 


Find the unconditional probability distribution for У. 


A coin has probability p of coming up heads when tossed. In n independent tosses of the coin, 
let X; = 1 if the ith toss results in heads and X; = 0 if the ith toss results in tails. Then 
Y, the number of heads in the n tosses, has a binomial distribution and can be represented as 
Y = 7, X;. Find E(Y) and V (Y), using Theorem 5.12. 


The negative binomial random variable Y was defined in Section 3.6 as the number of the trial 
on which the rth success occurs, in a sequence of independent trials with constant probability 
p of success on each trial. Let X; denote a random variable defined as the number of the trial 
on which the ith success occurs, fori = 1,2,..., r. Now define 


W; = X; — Xii, L1), 


where X, is defined to be zero. Then we can write Y — Nn W;. Notice that the random 
variables Wi, W2, ..., W, have identical geometric distributions and are mutually independent. 
Use Theorem 5.12 to show that E(Y) = r/p and V(Y) = r(1— р)/р?. 


A box contains four balls, numbered 1 through 4. One ball is selected at random from this box. 
Let 


X, = 1 if ball 1 or ball 2 is drawn, 
X5 = 1 if ball 1 or ball 3 is drawn, 
Хз = 1l if ball 1 or ball 4 is drawn. 


The X; values are zero otherwise. Show that any two of the random variables X1, X2, and Хз 
are independent but that the three together are not. 


Suppose that we are to observe two independent random samples: Y;, Y2,..., Y, denoting a 
random sample from a normal distribution with mean 44; and variance бү; and X1, X5,..., Xm 
denoting a random sample from another normal distribution with mean и» and variance o2. 
An approximation for ш — шо is given by Y — X, the difference between the sample means. 
Find E(Y — X) and V(Y — X). 
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5.162 


*5.163 


*5.164 


*5.165 


*5.166 


In Exercise 5.65, you determined that, for —1 < o < 1, the probability density function of 
(Yi, Y2) is given by 


[1 — o((1—2e ?)(1—-2e ?)]e л», 0 < 1,05 y. 
РО, 2) = 


; elsewhere, 


and is such that the marginal distributions of Y; and Y, are both exponential with mean 1. You 
also showed that Y, and У are independent if and only if œ = 0. Give two specific and different 
joint densities that yield marginal densities for Y; and Y» that are both exponential with mean 
1. 


Refer to Exercise 5.66. If Fi (yi) and Р (уз) are two distribution functions then for any a, —1 < 
а < 1, 


Fyi, y) = (71) 02) — afl Ё|(у,)}{1— how} 


is a joint distribution function such that Y; and Y, have marginal distribution functions Ё (y1) 
and № (ух), respectively. 


a ІР (у) and F5(y;) are both distribution functions associated with exponentially dis- 
tributed random variables with mean 1, show that the joint density function of Y, and Y is 
the one given in Exercise 5.162. 


b ЕА (у) and Р (у) are both distribution functions associated with uniform (0, 1) random 
variables, for any а, —1 < о < 1, evaluate F (y1, yz). 


с Find the joint density functions associated with the distribution functions that you found in 
part (b). 

d Give two specific and different joint densities such that the marginal distributions of Y, and 
Y; are both uniform on the interval (0, 1). 


Let Х|, X5, апа X; be random variables, either continuous or discrete. The joint moment- 
generating function of Х|, X2, and X; is defined by 


Mint в) = Et ERE, 


a Show that m(t, t, t) gives the moment-generating function of X, + X» + X3. 
b Show that m(t, t, 0) gives the moment-generating function of X, + X2. 
с Show that 


д0 m (ti, to, tz) LE y^ yey" 
ar 912 915 m q 459 463 Ja 
1 2 3 п=0=3=0 


Let X,, X», and X; have a multinomial distribution with probability function 


n 


n! X1 X2 x3 ; 
р(х1, х2, хз) = 21 Ро Рз. х = п. 
X1 1X2 1X3 i=l 


Use the results of Exercise 5.164 to do the following: 


a Find the joint moment-generating function of Х|, X2, and X3. 


b Use the answer to part (a) to show that the marginal distribution of X, is binomial with 
parameter pı. 


с Use the joint moment-generating function to find Cov(X), X2). 


A box contains N, white balls, Nz black balls, and №; red balls (№ + № + М; = N). А 
random sample of n balls is selected from the box (without replacement). Let Y;, Y2, and Y3 


*5.167 
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denote the number of white, black, and red balls, respectively, observed in the sample. Find 
the correlation coefficient for Y; and Y2. (Let p; = N;/N, fori = 1, 2, 3.) 


Let Y; and Y, be jointly distributed random variables with finite variances. 


a Show that [E(Y1Y;2)? < E(Y) EQ). [Hint: Observe that E[(tY; — Y2)?] > 0 for any 
real number f or, equivalently, 


РЕ(Ү?) — 2E (Y1 Y2) + E(Y2) > 0. 


This is a quadratic expression of the form At? + Bt + C; and because it is nonnegative, 
we must have B? — 4AC < 0. The preceding inequality follows directly. ] 

b Let p denote the correlation coefficient of Y, and У. Using the inequality in part (a), show 
that p? « 1. 


296 


6.1 


CHAPTER 6 


Functions of 
Random Variables 


6.1 Introduction 

6.2 Finding the Probability Distribution of a Function of Random Variables 
6.3 The Method of Distribution Functions 

6.4 The Method of Transformations 

6.5 The Method of Moment-Generating Functions 

6.6 Multivariable Transformations Using Jacobians (Optional) 

6.7 Order Statistics 

6.8 Summary 


References and Further Readings 


Introduction 


As we indicated in Chapter 1, the objective of statistics is to make inferences about 
a population based on information contained in a sample taken from that popula- 
tion. Any truly useful inference must be accompanied by an associated measure of 
goodness. Each of the topics discussed in the preceding chapters plays a role in the 
development of statistical inference. However, none of the topics discussed thus far 
pertains to the objective of statistics as closely as the study of the distributions of 
functions of random variables. This is because all quantities used to estimate popula- 
tion parameters or to make decisions about a population are functions of the n random 
observations that appear in a sample. 

To illustrate, consider the problem of estimating a population mean, и. Intuitively 
we draw a random sample of n observations, у, y2,..., Yn, from the population and 
employ the sample mean 


= Уз yo +: + yn 1 - 
у= = d 
J 1 


п 
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as an estimate for u. How good is this estimate? The answer depends upon the 
behavior of the random variables Y;, Y2,..., Y, and their effect on the distribution of 
Yedox a. 

A measure of the goodness of an estimate is the error of estimation, the difference 
between the estimate and the parameter estimated (for our example, the difference 
between y and џи). Because Y;, Ү,..., Y, are random variables, in repeated sampling 
Y is also a random variable (and a function of the n variables Y1, Y2, .. . , У,). There- 
fore, we cannot be certain that the error of estimation will be less than a specific value, 
say, B. However, if we can determine the probability distribution of the estimator Y, 
this probability distribution can be used to determine the probability that the error of 
estimation is less than or equal to B. 

To determine the probability distribution for a function of n random variables, 
Yi, Y, ..., Y,, we must find the joint probability distribution for the random variables 
themselves. We generally assume that observations are obtained through random 
sampling, as defined in Section 2.12. We saw in Section 3.7 that random sampling 
from a finite population (sampling without replacement) results in dependent trials 
but that these trials become essentially independent if the population is large when 
compared to the size of the sample. 

We will assume throughout the remainder of this text that populations are large in 
comparison to the sample size and consequently that the random variables obtained 
through a random sample are in fact independent of one another. Thus, in the discrete 
case, the joint probability function for Y;, Y2,..., Ү,, all sampled from the same 
population, is given by 


pw Yo... Yn) = PODPO) pO). 


In the continuous case, the joint density function is 


РО, y»... Yn) = РОЛ) РО): f). 


The statement “У, Y2,..., Y, is a random sample from a population with density 
f(y)” will mean that the random variables are independent with common density 
function f (y). 


Finding the Probability Distribution 
of a Function of Random Variables 


We will present three methods for finding the probability distribution for a function 
of random variables and a fourth method for finding the joint distribution of several 
functions of random variables. Any one of these may be employed to find the distri- 
bution of a given function of the variables, but one of the methods usually leads to 
a simpler derivation than the others. The method that works “best” varies from one 
application to another. Hence, acquaintance with the first three methods is desirable. 
The fourth method is presented in (optional) Section 6.6. Although the first three 
methods will be discussed separately in the next three sections, a brief summary of 
each of these methods is provided here. 
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6.3 


Consider random variables Y1, Yo,..., Y, and a function U(Y;, Yo, ..., Yn), 
denoted simply as U. Then three of the methods for finding the probability distribution 
of U are as follows: 


1: 


The method of distribution functions: This method is typically used when the 
Y's have continuous distributions. First, find the distribution function for U, 
Fy(u) = P(U < u), by using the methods that we discussed in Chapter 5. To 
do so, we must find the region in the y1, y2, ..., Yn space for which U < u and 
then find P(U < и) by integrating f (yi, уо, ..., Yn) over this region. The 
density function for U is then obtained by differentiating the distribution 
function, Fy(u). A detailed account of this procedure will be presented in 
Section 6.3. 

The method of transformations: If we are given the density function of arandom 
variable Y, the method of transformations results in a general expression for the 
density of U = h(Y) for an increasing or decreasing function A (y). Then if Y; 
and У have a bivariate distribution, we can use the univariate result explained 
earlier to find the joint density of Y; and U = A(Y|, У). By integrating over уџ, 
we find the marginal probability density function of U, which is our objective. 
This method will be illustrated in Section 6.4. 

The method of moment-generating functions: This method is based on a 
uniqueness theorem, Theorem 6.1, which states that, if two random variables 
have identical moment-generating functions, the two random variables pos- 
sess the same probability distributions. To use this method, we must find the 
moment-generating function for U and compare it with the moment-generating 
functions for the common discrete and continuous random variables derived in 
Chapters 3 and 4. If it is identical to one of these moment-generating functions, 
the probability distribution of U can be identified because of the uniqueness 
theorem. Applications of the method of moment-generating functions will be 
presented in Section 6.5. Probability-generating functions can be employed 
in a way similar to the method of moment-generating functions. If you are 
interested in their use, see the references at the end of the chapter. 


The Method of Distribution Functions 


We will illustrate the method of distribution functions with a simple univariate ex- 


ample. If Y has probability density function f(y) and if U is some function of Y, 


then we can find Fy (и) = P(U < и) directly by integrating f(y) over the region for 


which U < и. The probability density function for U is found by differentiating 
Fy (и). The following example illustrates the method. 


EXAMPLE 6.1 


A process for refining sugar yields up to 1 ton of pure sugar per day, but the actual 


amount produced, Y, is a random variable because of machine breakdowns and other 
slowdowns. Suppose that Y has density function given by 


2y, О<у<1, 


0, elsewhere. 


го = | 


Solution 
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The company is paid at the rate of $300 per ton for the refined sugar, but it also has a 
fixed overhead cost of $100 per day. Thus the daily profit, in hundreds of dollars, is 
U = 3Y — 1. Find the probability density function for U. 


To employ the distribution function approach, we must find 


кш) = PU <u) = PGY-1<)=P(y<***), 


If u < —1, then (и + 1)/3 < О and, therefore, Fy (u) = P (Y < (и + 1)/3) = 0. 
Also, if u > 2, then (и + 1)/3 > 1 and Fy(u) = P (Y < (u + 1)/3) = 1. However, 
if —1 < u < 2, the probability can be written as an integral of f(y), and 


TES (и+1)/3 (u4-1)/3 u+1 2 
p(y = = | fay = | 2y dy = | 
3 бё 0 3 


(Notice that, as Y ranges from 0 to 1, U ranges from —1 to 2.) Thus, the distribution 
function of the random variable U is given by 


0, и < –1, 
и+1 a 
Fy(u) = ( 2 Js —] <и <2, 
1, u > 2, 


and the density function for U is 


dFy(u) _ (ee D. -l<u<2, 


и) = 
fulu) du 0, elsewhere. ш 


In the bivariate situation, let Y; and Y? be random variables with joint density 
f yi, y2) and let U = h(Y, Y2) be a function of Y; and Y2. Then for every point 
(y1, y2), there corresponds one and only one value of О. If we can find the region of 
values (y1, y2) such that U < и, then the integral ofthe joint density function f (y1, y2) 
over this region equals P(U < и) = Fy(u). As before, the density function for 
U can be obtained by differentiation. 

We will illustrate these ideas with two examples. 


EXAMPLE 6.2 


In Example 5.4, we considered the random variables Y; (the proportional amount 
of gasoline stocked at the beginning of a week) and Y, (the proportional amount of 
gasoline sold during the week). The joint density function of Y; and Y; is given by 


Зу, O<w<y <1, 


fou =| 


0, elsewhere. 


Find the probability density function for U = Y, — Y», the proportional amount of 
gasoline remaining at the end of the week. Use the density function of U to find E(U). 
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FIGURE 6.1 
Region over which 
Ё(у, у) is positive, 
Example 6.2 


Solution 


У 


у 


The region over which (у, y2) is not zero is sketched in Figure 6.1. Also shown 
there is the line y; — y2 = и, for a value of u between О and 1. Notice that any point 
(ут, y2) such that y; — y» < u lies above the line у — y2 = и. 

If u < 0, the line уу — y2 = и has intercept —u < 0 and Fy(u) = P(Y; — Y € 
и) = 0. When u > 1, the line уу — y2 = и has intercept —u < —1 and Fy(u) = 1. 
For 0 < u < 1, Fy(u) = P(Y, — Yo < и) is the integral over the dark shaded region 
above the line y; — y2 = и. Because it is easier to integrate over the lower triangular 
region, we can write, for 0 < и < І, 


Fy(u) = PU Su)-1-— PU > и) 


1 yi—u 
=1- | f 3yı dy2 dy 
и 0 


1 
= i- f 3yi(y1 = и) dy; 


= lau — u’). 
2 
Summarizing, 
0, и < 0, 
Fy(u) = { (Зи – и?)/2, 0 <и <1, 
1, и > 1. 


A graph of Fy (и) is given in Figure 6.2(a). 
It follows that 


vu dFy(u) _ E -45/2, 0<и<1, 


du 0, elsewhere. 


The density function fy (и) is graphed in Figure 6.2(b). 


FIGURE 6.2 
Distribution and 
density functions 
for Example 6.2 
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Fy (и) fy Q0 
1.5 


| 
0 1 и 0 1 и 


(a) Distribution Function (b) Density Function 


We can use this derived density function to find E(U), because 


A 2 3 fu? uN] 3 
carn [aJa E] 


which agrees with the value of E(Y; — Y2) found in Example 5.20 by using the 
methods developed in Chapter 5 for finding the expected value of a linear function of 
random variables. п 


EXAMPLE 6.3 


Solution 


Let (Y1, Y2) denote a random sample of size n = 2 from the uniform distribution on 
the interval (0, 1). Find the probability density function for О = Y, + Y2. 


The density function for each Y; is 


1, O<y<l, 
w=] и 


0, elsewhere. 


Therefore, because we have a random sample, Y; and Y» are independent, and 
І, O<y<1,05y<1, 


0, elsewhere. 


fOr yd) = fioDfGo» = | 


The random variables Y; and Y) have nonzero density over the unit square, as 
shown in Figure 6.3. We wish to find Fy(u) = P(U < и). The first step is to find 
the points (yı, y2) that imply yı + y2 < и. The easiest way to find this region is to 
locate the points that divide the regions U < и and U > и. These points lie on the 
line yj + у = и. 

Graphing this relationship in Figure 6.3 and arbitrarily selecting y as the dependent 
variable, we find that the line possesses a slope equal to —1 and a y» intercept equal 
to и. The points associated with U < и are either above or below the line and can 
be determined by testing points on either side of the line. Suppose that и = 1.5. 
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FIGURE 6.3 
The region of 
integration for 
Example 6.3 


У 


1 


» 


Let y; = y? = 1/4; then y; + y2 = 1/4 + 1/4 = 1/2 and (yı, y») satisfies the 
inequality y; + y2 < и. Therefore, y; = y2 = 1/4 falls in the shaded region below 
the line. Similarly, all points such that y; + yo < и lie below the line y; + y2 = и. 
Thus, 


Fy(u) = PU < и) = Р(ү + № < u) = f f Qi y» дуу дуз. 
yı +y <и 


Ifu <0, 


Fy(u) = P(U < u) = | fou y2) Яу dy2 = |] Ody, dy) = 0 
Yity2<u yity2 Su 


and for u > 2, 


1 pl 
re Pu soc || гооо f [asd =. 
У+у <и уо 
For 0 < u < 2, the limits of integration depend upon the particular value of и 
(where и is the у» intercept of the line y; + yo = и). Thus, the mathematical expression 
for Fy (и) changes depending on whether 0 < u < lor! <u x 2. 
If 0 < и < 1, the region y; + y2 < и, is the shaded area in Figure 6.4. Then for 
0 <u < 1, we have 


Fy(u) = 1] Робб = | f (04d = | (и — у) dy 
о Jo 0 
2 


yity2Su 


2 и 2 
edere 
0 


The solution, Fy(u), 0 < и < 1, could have been acquired directly by using 
elementary geometry. The bivariate density (ут, y2) = 1 is uniform over the unit 
square, 0 € y; < 1,0 € у < 1. Hence, Еу (и) is the volume of a solid with height 
equal to f(y1, y2) = 1 and a triangular cross section, as shown in Figure 6.4. Hence, 

и? и? 
Fy (и) = (area of triangle) - (height) = ay = gy" 

The distribution function can be acquired in a similar manner when u is defined 

over the interval 1 < и < 2. Although the geometric solution is easier, we will obtain 


= 
= 
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FIGURE 6.4 у, 


The region | 
yi y» < ufor g 
O<u<1 
1 РД 
Fy (и) directly by integration. The region уу + yo < и, 1 < u < 2 is the shaded area 
indicated in Figure 6.5. 
The complement of the event U < и is the event that (Y;, Y2) falls in the region 
A of Figure 6.5. Then for 1 « u x 2, 
row =1- f | го» 
A 
1 1 1 1 
=1- f f @anar=1- f(x], Jan 
u—l Ju—y2 и—1 и—у› 
1 y27 7! 
= 1- f пиж =1-[@-ш»+ | 
и—1 u—l 
= (—и2/2) + 2u — 1. 
To summarize, 
0, и<0, 
и? /2, О<и<1, 
Fy (и) = 5 
(—-u^/2)-2u—1, 1<их2, 
T; u > 2. 
The distribution function for U is shown in Figure 6.6(a). 
FIGURE 6.5 
The region 
y+ y <u, 


1<и< 2 


Yi 
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FIGURE 6.6 Е, (и) fy Q0 
Distribution and 
density functions 
for Example 6.3 


| | 
0 1 2 и 0 1 2 и 


(a) Distribution Function (b) Density Funciton 


The density function fy (и) can be obtained by differentiating Fy (и). Thus, 


< (0) = 0, u <0, 
dFy(u) | 2506/2 =u O<u<l, 
fuu) = d à " 
и 16-и /2) + 2u — П = 2-и, 1<и<2, 
£ (1) = 0, i5 
or, more simply, 
u, 0<u <1, 
(и) = 42-и, 1<u<2, 
0, otherwise. 
A graph of fy (и) is shown in Figure 6.6(b). BH 


Summary of the Distribution Function Method 
Let U be a function of the random variables Y;, Yo,..., Yn. 


1. Find the region О = и in the (yi, y», ..., Yn) space. 

2. Findthe region U <и. 

3. Find Fy(u) = P(U < и) by integrating f (yi, yo, ..., Уп) over the 
region U < u. 

4. Find the density function fy (и) by differentiating Fy (u). Thus, 
fu (u) = dFy(u)/du. 


To illustrate, we will consider the case U = h(Y) = Y?, where Y is a continuous 
random variable with distribution function Fy (у) and density function fy(y).Ifu < 0, 
Fy(u) = P(U <u) = P(Y? < и) = 0 and for u > 0 (see Figure 6.7), 


Fy(u) = P(U <и) = Р(Ү? <u) 
= P(-J/u < Y < Ju) 


Ji 
= РО) dy = Fy (Vu) — Fy(—A/u). 
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FIGURE 6.7 
The function 
h(y) = y? 


h(y)=y? 


In general, 
Fy(./u) — Fy(—./u), и> 0, 


0, otherwise. 


Fy (u) = | 


On differentiating with respect to и, we see that 


l 1 
no- rv az) raa) 929 
i otherwise, 
or, more simply, 
1 
fu(u) = za Vr + fr(-vu)], и > 0, 
0, otherwise. 
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EXAMPLE 6.4 Let Y have probability density function given by 
yl 


fv) = 2." 
0, elsewhere. 


-Isy<l, 


Find the density function for U = Y?. 


Solution We know that 
1 
fuu) = 4 2и 


0, otherwise, 


[fr Gu) + fr(-Vw)], и> 0, 


and on substituting into this equation, we obtain 
1 (jfu+l1 R —vu+1\_ 1 
fulu) 2 1 2 Ju 2 2 E 7 


0, elsewhere. 


0<u <1, 
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Because Y has positive density only over the interval —1 < y < 1, it follows that 
U = Y? has positive density only over the interval 0 < и < 1. = 


In some instances, it is possible to find a transformation that, when applied to a 
random variable with a uniform distribution on the interval (0, 1), results in a random 
variable with some other specified distribution function, say, F (y). The next example 
illustrates a technique for achieving this objective. A brief discussion of one practical 
use of this transformation follows the example. 


EXAMPLE 6.5 


Solution 


Let U be a uniform random variable on the interval (0, 1). Find a transformation 
G(U) such that G(U) possesses an exponential distribution with mean £. 


If U possesses a uniform distribution on the interval (0, 1), then the distribution 
function of U (see Exercise 4.38) is given by 


0, u <0, 
Ер(и) = үи, О<и<1, 
1, и> 1. 


Let Y denote a random variable that has an exponential distribution with mean f. 
Then (see Section 4.6) Y has distribution function 


P , у < 0, 
ү(у) = 1— eÊ, y>0. 


Notice that Ру (у) is strictly increasing on the interval [0, оо). Let 0 <и < 1 
and observe that there is a unique value y such that Fy(y) = и. Thus, Fy Hu), 
0 <и < 1, is well defined. In this case, Fy(y) = 1 — e^"/ = и if and only if 
y = —fln(1—u) = Еу (и). Consider the random variable F;!(U) = —81n(1—U) 
and observe that, if y > 0, 

P(Fy'(U) x у) = P[-£1n(1 — U) x y] 
= P[In(1 — U) > —y/B] 
= P(U < 1— 7/8) 
= 1 =g Ê, 


Also, Pl Fy) < у] = Oif y < 0. Thus, F,(U) = —f1n(1 — U) possesses an 
exponential distribution with mean £, as desired. ш 


Computer simulations are frequently used to evaluate proposed statistical tech- 
niques. Typically, these simulations require that we obtain observed values of random 
variables with a prescribed distribution. As noted in Section 4.4, most computer 
systems contain a subroutine that provides observed values of a random variable 
U that has a uniform distribution on the interval (0, 1). How can the result of 
Example 6.5 be used to generate a set of observations from an exponential distribution 


6.1 


6.2 


6.3 
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with mean 8? Simply use the computer's random number generator to produce 
values и, u2,..., Un from a uniform (0, 1) distribution and then calculate у; = 
—p In(1 — uj), i = 1,2, ..., n to obtain values of random variables with the required 
exponential distribution. 

As long as a prescribed distribution function F(y) possesses a unique inverse 
F—!(-), the preceding technique can be applied. In instances such as that illustrated 
in Example 6.5, we can readily write down the form of F~! (-) and proceed as earlier. 
If the form of a distribution function cannot be written in an easily invertible form 
(recall that the distribution functions of normally, gamma-, and beta- distributed 
random variables are given in tables that were obtained by using numerical integration 
techniques), our task is more difficult. In these instances, other methods are used to 
generate observations with the desired distribution. 

In the following exercise set, you will find problems that can be solved by using the 
techniques presented in this section. The exercises that involve finding F7! (U) for 
some specific distribution F (y) focus on cases where F~'(-) exists in a closed form. 


Exercises 


Let Y be a random variable with probability density function given by 
у= е 0<у<1, 
0, elsewhere. 
Find the density function of U, = 2Y — 1. 
Find the density function of U; = 1 — 2Y. 
Find the density function of U = Y?. 


Find E(U,), E(U2), and E(U3) by using the derived density functions for these random 
variables. 


e Find E(U,), E(U2), and E(U3) by the methods of Chapter 4. 


2 о C$ 


Let У be a random variable with a density function given by 
 [G/2y, -l<y<1, 
i 0, elsewhere. 

a Find the density function of U, = ЗУ. 

b Find the density function of U; = 3 — Y. 

c Find the density function of U; = Y?. 


РО) 


A supplier of kerosene has a weekly demand У possessing a probability density function 
given by 

у, О<у<1, 

fo)—-1L 1<у<1.5, 

0, elsewhere, 
with measurements in hundreds of gallons. (This problem was introduced in Exercise 4.13.) 
The supplier’s profit is given by U = 10Y — 4. 
a Find the probability density function for U. 
b Use the answer to part (a) to find E(U). 
c Find E(U) by the methods of Chapter 4. 
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6.4 


6.5 


6.6 


6.7 


6.8 


6.9 


6.10 


The amount of flour used per day by a bakery is a random variable Y that has an exponential 
distribution with mean equal to 4 tons. The cost of the flour is proportional to U = 3Y + 1. 


a Find the probability density function for U. 
b Use the answer in part (a) to find E(U). 


The waiting time Y until delivery of a new component for an industrial operation is uniformly 
distributed over the interval from 1 to 5 days. The cost of this delay is given by U — 2Y? 4- 3. 
Find the probability density function for U. 


The joint distribution of amount of pollutant emitted from a smokestack without a cleaning 
device (Y) and a similar smokestack with a cleaning device (Y?) was given in Exercise 5.10 
to be 


1, O<y<2, 0<y <1, 2y < у, 


ХО, уз) = | 


0, elsewhere. 


The reduction in amount of pollutant due to the cleaning device is given by U = Y, — Y;. 


a Find the probability density function for U. 
b Use the answer in part (a) to find E(U). Compare your results with those of Exercise 
5.78(c). 


Suppose that Z has a standard normal distribution. 


a Find the density function of U = Z?. 
b Does U have a gamma distribution? What are the values of œ and В? 


c What is another name for the distribution of U? 
Assume that Y has a beta distribution with parameters о and f. 


a Find the density function of U = 1 — Y. 

b Identify the density of U as one of the types we studied in Chapter 4. Be sure to identify 
any parameter values. 

с How 15 E(U) related to E(Y)? 

d How is V(U) related to V (Y)? 


Suppose that a unit of mineral ore contains a proportion Y, of metal A and a proportion У 
of metal B. Experience has shown that the joint probability density function of Y, and Y; is 
uniform over the region 0 < y; < 1, 0x y» < 1, 0 < yı +y < 1. Let U = Y; + Y; the 
proportion of either metal A or B per unit. Find 


a the probability density function for U. 

b  E(U) by using the answer to part (a). 

с E(U)by using only the marginal densities of Y, and Y». 

The total time from arrival to completion of service at a fast-food outlet, Y;, and the time spent 


waiting in line before arriving at the service window, Y», were given in Exercise 5.15 with joint 
density function 
e", 0xyzyc«oo 

fou у) = | i К 

FOr» 0, elsewhere. 
Another random variable of interest is U = Y, — Y», the time spent at the service window. Find 
a the probability density function for U. 
b E(U) and V(U). Compare your answers with the results of Exercise 5.108. 


6.12 


6.13 


6.14 


6.15 


6.16 


6.17 
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Suppose that two electronic components in the guidance system for a missile operate indepen- 
dently and that each has a length of life governed by the exponential distribution with mean 1 
(with measurements in hundreds of hours). Find the 


a probability density function for the average length of life of the two components. 
b mean and variance of this average, using the answer in part (a). Check your answer by 
computing the mean and variance, using Theorem 5.12. 


Suppose that Y has a gamma distribution with parameters o and В and that c > O is a constant. 


a Derive the density function of U = cY. 

b Identify the density of U as one of the types we studied in Chapter 4. Be sure to identify 
any parameter values. 

с The parameters о and f of a gamma-distributed random variable are, respectively, “shape” 


and "scale" parameters. How do the scale and shape parameters for U compare to those 
for Y? 


If Y, and Y» are independent exponential random variables, both with mean £, find the density 
function for their sum. (In Exercise 5.7, we considered two independent exponential random 
variables, both with mean | and determined P(Y, + Y; < 3).) 


In a process of sintering (heating) two types of copper powder (see Exercise 5.152), the density 
function for Y;, the volume proportion of solid copper in a sample, was given by 


6yi(1 — ‚ 0< <1, 
fon = | yi y) У 


0, elsewhere. 
The density function for Y», the proportion of type A crystals among the solid copper, was 
given as 


3y3, O< yw <1, 
hO) = | = 
0, elsewhere. 
The variable U = Y, Y; gives the proportion of the sample volume due to type A crystals. If Y, 


and Y> are independent, find the probability density function for U. 
Let Y have a distribution function given by 
0, y <0, 
Е(у) = NES 
1—e?, yzO. 
Find a transformation G(U) such that, if U has a uniform distribution on the interval (0, 1), 
С (U) has the same distribution as У. 


In Exercise 4.15, we determined that 


b ый 

a „Кс Bs 
ХО) = } у? 

0, elsewhere, 


is a bona fide probability density function for a random variable, Y. Assuming b is a known 
constant and U has a uniform distribution on the interval (0, 1), transform U to obtain a random 
variable with the same distribution as У. 


A member of the power family of distributions has a distribution function given by 


0, y <0, 
ғо) = (5). охув, 
1, у> Өө, 


where о, 0 > 0. 
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6.18 


6.19 


6.20 


*6.21 


*6.22 


6.4 


a Find the density function. 

b For fixed values of a and Ө, find a transformation G(U) so that G(U) has a distribution 
function of F when U possesses a uniform (0, 1) distribution. 

c Given that a random sample of size 5 from a uniform distribution on the interval (0, 1) 
yielded the values .2700, .6901, .1413, .1523, and .3609, use the transformation derived in 
part (b) to give values associated with a random variable with a power family distribution 
witha = 2, 0 = 4. 

A member of the Pareto family of distributions (often used in economics to model income 

distributions) has a distribution function given by 


0, у «f. 
Ру (RY 
1 (£) ‚ JZB 


where a, f > 0. 


a Find the density function. 

b For fixed values of В and а, find a transformation G(U) so that G(U) has a distribution 
function of F when U has a uniform distribution on the interval (0, 1). 

c Given that a random sample of size 5 from a uniform distribution on the interval (0, 1) 
yielded the values .0058, .2048, .7692, .2475 and .6078, use the transformation derived in 
part (b) to give values associated with a random variable with a Pareto distribution with 
а= 2, В = 3. 


Refer to Exercises 6.17 and 6.18. If Y possesses a Pareto distribution with parameters o and 
В, prove that X = 1/Y has a power family distribution with parameters о and Ө = £^ !. 


Let the random variable Y possess a uniform distribution on the interval (0, 1). Derive the 


a distribution of the random variable W — Y?. 
b distribution of the random variable W — SY Я 


Suppose that У is a random variable that takes on опу integer values 1, 2, .... Let F (y) denote 
the distribution function of this random variable. As discussed in Section 4.2, this distribution 
function is a step function, and the magnitude of the step at each integer value is the probability 
that Y takes on that value. Let U be a continuous random variable that is uniformly distributed 
on the interval (0, 1). Define a variable X such that X = k if and only if F(k—1) < U < F(K), 
k = 1,2,.... Recall that F(0) = 0 because Y takes on only positive integer values. Show that 
P(X =i)= F(i)— F(i—1)= P(Y =i), i=1,2,.... That is, X has the same distribution 
as Y. [Hint: Recall Exercise 4.5.]! 


Use the results derived in Exercises 4.6 and 6.21 to describe how to generate values of a 
geometrically distributed random variable. 


The Method of Transformations 


The transformation method for finding the probability distribution of a function of 
random variables is an offshoot of the distribution function method of Section 6.3. 
Through the distribution function approach, we can arrive at a simple method of 


1. Exercises preceded by an asterisk are optional. 


FIGURE 6.8 
An increasing 
function 
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u, =h(y,) 


writing down the density function of U = h(Y), provided that Л (у) is either decreas- 
ing or increasing. [By Л (у) increasing, we mean that if yj < y2, then A(y1) < h(y2) 
for any real numbers y; and у».] The graph of an increasing function Л (у) appears in 
Figure 6.8. 

Suppose that Л Су) is an increasing function of y and that U = h(Y), where Y has 
density function fy (y). Then A^! (и) is an increasing function of u: If u} < u2, then 
h-!(uj) = уу < y2 = h^! (из). We see from Figure 6.8 that the set of points y such 
that h(y) < ші is precisely the same as the set of points y such that y < A^! (uj). 
Therefore (see Figure 6.8), 


P(U <u) = P[h(Y) x и] = PU [h(Y)] < h^! (u)) = PLY < А7 !(и)] 


Or 
Fy (u) = Fy[h '(и)). 
Then differentiating with respect to и, we have 
dFy(u) | dFy[h7"(w)] i n dA O] 
fps = — = (шу ———. 
du du du 
To simplify notation, we will write dh™! /du instead of d[h~'(u)]/du and 


E gu 
иби) = frih WI, — 
и 
Thus, we have acquired a new way to find fy(u) that evolved from the general 
method of distribution functions. To find fr (и), solve for у in terms of u; that is, find 
y = h`! (u) and substitute this expression into fy (у). Then multiply this quantity by 


dh-/du. We will illustrate the procedure with an example. 


EXAMPLE 6.6 


In Example 6.1, we worked with a random variable Y (amount of sugar produced) 
with a density function given by 


fv) = | 


We were interested in a new random variable (profit) given by U = 3Y — 1. Find the 
probability density function for U by the transformation method. 


2y, О<у<1, 


0, elsewhere. 
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Solution 


The function of interest here is h(y) = 3y — 1, which is increasing in y. If u = 3y—1, 
then 
1 dh (#4 
у= Аи) = LT апа = ica : 
3 du du 3 
Thus, 
E dh! 
fu) = fr[h (u)] 
du 


dh7! 1 1 1 
anu =2 (24 а 
аи 3 3 3 


0, elsewhere, 


or, equivalently, 


The 


2(u4-1)/9, -l<u<2, 


0, elsewhere. 


fulu) = | 


range over which fy (и) is positive is simply the 1пїегуа10 < y < 1 transformed to 


the u axis by the function и = Зу — 1. This answer agrees with that of Example 6.1. lll 


FIGURE 6.9 
A decreasing function 


If h(y) is a decreasing function of y, then A^! (и) is a decreasing function of u. 
That is, ifu; < u2, then hA“! (u1) = yı > y2 = h^! (uz). Also, as in Figure 6.9, the set 
of points y such that h(y) < u; is the same as the set of points such that y > A^! (u1). 

It follows that, for U = h(Y), as shown in Figure 6.9, 


Р(О <и) = Р[Ү > h(u)) ог Fy(u)= 1 – Fy[h  (u)]. 


If we differentiate with respect to и, we obtain 


и = 


d[h (и) 


fu (u) = — ИВ \(иу]—————. 
и 


0 y = hu) y 
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If we again use the simplified notation dA ^! /du instead of d[h~!(u)]|/du and recall 
that dh~'/du is negative because ^! (и) is a decreasing function of u, the density 
of U is 


аһ 


fu) = frih DE 


Actually, it is not necessary that h(y) be increasing or decreasing (and hence 
invertable) for all values of y. The function A (-) need only be increasing or decreasing 
for the values of y such that fy (y) > 0. The set of points (y: fy (y) > 0] is called the 
support of the density fy (y). If y = ^! (u) is not in the support of the density, then 
fy[h! (u)] = 0. These results are combined in the following statement: 


Let Y have probability density function fy (y). If h(y) is either increasing or 
decreasing for all y such that fy (y) > 0, then U = h(Y) has density function 


dh"! Е d[h—!(u)] 
= du d 


аһ! 
fo) = frih" (и)] Z 5 where 


EXAMPLE 6.7 


Solution 


Let Y have the probability density function given by 
2y, O<y<l, 
redes | 0, elsewhere. 
Find the density function of U = —4Y +3. 


In this example, the set of values of y such that fy (y) > 0 are the values 0 < у < 1. 
The function of interest, h(y) = —4y + 3, is decreasing for all y, and hence for all 
0< y < 1,if u = —4y + 3, then 


А-и) —u d dh! 1 
sh (и) = ап = –-. 
d du — 4 
Notice that ^! (и) is a decreasing function of u and that dh ^! /du < 0. Thus, 
Fi : (525) l о tsi 
Јо (и) = frth"w)] | = 4 4 4 
0, elsewhere. 

Finally, some simple algebra gives 

3—u 

стаг: =] < и = 3, 

fuu) = 8 
0, elsewhere. ш 


Direct application of the method of transformation requires that the function Л (у) 
be either increasing or decreasing for all y such that fy (y) > 0. If you want to use this 
method to find the distribution of U = h(Y), you should be very careful to check that 
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the function /(-) is either increasing or decreasing for all y in the support of fy (y). 
If itis not, the method of transformations cannot be used, and you should instead use 
the method of distribution functions discussed in Section 6.3. 

The transformation method can also be used in multivariate situations. The fol- 
lowing example illustrates the bivariate case. 


EXAMPLE 6.8 


Solution 


Let Y; and Y, have a joint density function given by 


e Ot») 0< yı 0< уз 
Fons) =| iG nica 
: 0, elsewhere. 
Find the density function for U = ү + №. 


This problem must be solved in two stages: First, we will find the joint density of Y, 
апа U; second, we will find the marginal density of О. The approach is to let Ү be 
fixed at a value у; > 0. Then U = у + Y2, and we can consider the one-dimensional 
transformation problem in which U = h(Y2) = у + Y2. Letting 2(yi, u) denote the 
joint density of Y; and U, we have, with y = и — ур = h^! (u), 


-1 
8071, u) = Л», ЖО) P ~ gum O<y, О<и- у, 
0, elsewhere. 
Simplifying, we obtain 
e", O<yi <u, 
ELE | 0, elsewhere. 


(Notice that Ү < U.) The marginal density of U is then given by 


љо = | gı, и) dy, 


и 
| e" dy —-ue", О<и, 
0 


0, elsewhere. ш 


We will illustrate the use of the bivariate transformation with another example, 
this one involving the product of two random variables. 


EXAMPLE 6.9 


In Example 5.19, we considered a random variable Y4, the proportion of impurities in 
a chemical sample, and У, the proportion of type I impurities among all impurities 


in the sample. The joint density function was given by 
2A =y), 0<y <1, ТЕЕ А 
0, elsewhere. 


fO 2) = | 


We are interested іп U = ҮҮ, which is the proportion of type I impurities in the 
sample. Find the probability density function for U and use it to find E(U). 
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Solution Because we are interested in U = УУ», let us first fix Y; at a value yj, 0 < yı <1, 
and think in terms of the univariate transformation U = h(Y2) = yı Y2. We can then 
determine the joint density function for Y; and U (with уз = и/у = h^! (и)) to be 


_1 dh! 
£011) = fbi. ^7 00] 
du 
1 
2(1—y))|—|, 0<y <1, O<u/y <1, 
yı 
0, elsewhere. 
Equivalently, 


| 
2(1— —|, O<u<y, <1, 
OLE (1 — yi) (>) susy < 
0, elsewhere. 


(U also ranges between 0 and 1, but Y; always must be greater than or equal to U.) 
Further, 


ful) 


oo 
i gi. и) dy, 
—оо 


n 1 
| 2 — y) (+) d ws 
u MI 


0, elsewhere. 


Because, for 0 < и < 1, 


А 1 tfi 
f 2 - yp (+) de 2 | (= - ) ду, 
и y u yı 


=2 (in »]- vi.) 2:9 (tng ыд 


= 2(u — ln u — 1), 
we obtain 
fou) 2(u—1nnu—1, Oxucxl, 
и) = 
B 0, elsewhere. 


(The symbol In stands for natural logarithm.) 
We now find E(U): 


oo 1 
Е@)= | ufu) du = | 2u(u — Inu — 1) du 
—oo 0 


1 1 1 
al udu- | unudu — | 12 
0 0 0 
2 “| [ (In u) d “| 
= — | — u(In и) du — — я 
3 0 0 2 0 
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The middle integral is most easily solved by using integration by parts, which 
yields 


: u? ! l/u? 1 и2]! 1 
| u(inu) du = (5 )anw| -f (5) (Jau =0- a m 
0 2 0 0 2 и 4 0 4 
Thus, 
E(U) = 2[(1/3) — (—1/4) — (1/2)] = 201/12) = 1/6. 


This answer agrees with the answer to Example 5.21, where E(U) = E(Y, Y2) was 
found by a different method. 


Summary of the Transformation Method 


Let U = h(Y), where Л (у) is either an increasing or decreasing function of y 
for all y such that fy(y) > 0. 


1. Find the inverse function, y = h7!(u). 


dw dii" 
2. Evaluate d = | w1 


u du 
3. Find fy (и) by 


ie 


fuu) = fyIh (и)] EIE 
u 


Exercises 


6.23 In Exercise 6.1, we considered a random variable Y with probability density function given by 
20—y) О<у<1, 
ЙО) = 
0, elsewhere, 


and used the method of distribution functions to find the density functions of 


a U,;=2Y -1. 
b U,=1-—2Y. 
с U, = Y?. 


Use the method of transformation to find the densities of U;, U2, and U3. 


6.24 In Exercise 6.4, we considered a random variable Y that possessed an exponential distribution 
with mean 4 and used the method of distribution functions to derive the density function for 
U = 3Y + 1. Use the method of transformations to derive the density function for U. 


6.25 In Exercise 6.11, we considered two electronic components that operate independently, each 
with life length governed by the exponential distribution with mean 1. We proceeded to use 
the method of distribution functions to obtain the distribution of the average length of life for 
the two components. Use the method of transformations to obtain the density function for the 
average life length of the two components. 


6.26 


6.27 


6.28 


6.29 


6.30 


6.31 


6.32 


6.33 
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The Weibull density function is given by 


m-l ,—y" /a 
enm 


1 
—ту у> 0, 
а 


ХО) = 


0, elsewhere, 
where œ and m are positive constants. This density function is often used as a model for the 


lengths of life of physical systems. Suppose Y has the Weibull density just given. Find 


a the density function of = Y". 
b E(Y*) for any positive integer k. 


Let Y have an exponential distribution with mean f. 


a Prove that W = VY has a Weibull density with a = В and m = 2. 
b Use the result in Exercise 6.26(b) to give E(Y"?) for any positive integer k. 


Let Y have a uniform (0, 1) distribution. Show that U = —2 In(Y) has an exponential distri- 
bution with mean 2. 


The speed of a molecule in a uniform gas at equilibrium is a random variable V whose density 
function is given by 


z 
fv) = ауе", v>0, 


where b = m/2kT and К, Т, and т denote Boltzmann’s constant, the absolute temperature, 
and the mass of the molecule, respectively. 

a Derive the distribution of W = m V?/2, the kinetic energy of the molecule. 

b Find E(W). 

A fluctuating electric current J may be considered a uniformly distributed random variable 


over the interval (9, 11). If this current flows through a 2-ohm resistor, find the probability 
density function of the power Р = 2/2. 


The joint distribution for the length of life of two different types of components operating in a 
system was given in Exercise 5.18 by 


Ў, ж) = oo dise ев 
bs Р = 


А elsewhere. 


The relative efficiency of the two types of components is measured by U = Y>/Y;. Find the 
probability density function for U. 


In Exercise 6.5, we considered a random variable Y that has a uniform distribution on the 
interval [1, 5]. The cost of delay is given by U = 2Y? + 3. Use the method of transformations 
to derive the density function of U. 


The proportion of impurities in certain ore samples is a random variable Y with a density 
function given by 


(3/2)у2 +y, О<у<1, 


0, elsewhere. 


ҒО) = | 


The dollar value of such samples is U = 5 — (Y/2). Find the probability density function for U. 
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6.34 


6.35 


6.36 


6.5 


THEOREM 6.1 


A density function sometimes used by engineers to model lengths of life of electronic compo- 
nents is the Rayleigh density, given by 


2» ev fo. у> 0 
ЈО) = 0 MT 


0, elsewhere. 
a IfY has the Rayleigh density, find the probability density function for U — Y?. 
b Use the result of part (a) to find E(Y) and V (Y). 


Let Y; and У be independent random variables, both uniformly distributed on (0, 1). Find the 
probability density function for U = Y,Y>. 


Refer to Exercise 6.34. Let Ү and У be independent Rayleigh-distributed random variables. 
Find the probability density function for U = Y? + Y. [Hint: Recall Example 6.8.] 


The Method of Moment-Generating 
Functions 


The moment-generating function method for finding the probability distribution of 
a function of random variables Y1, Y2,..., Y, is based on the following uniqueness 
theorem. 


Let m x (t) and my (t) denote the moment-generating functions of random vari- 
ables X and Y, respectively. If both moment-generating functions exist and 
mx(t) = my(t) for all values of t, then X and Y have the same probability 
distribution. 


(The proof of Theorem 6.1 is beyond the scope of this text.) 
If U is a function of n random variables, Y1, Y2,..., Yn, the first step in using 
Theorem 6.1 is to find the moment-generating function of U: 


mu(t) = E(e"). 


Once the moment-generating function for U has been found, it is compared with the 
moment-generating functions for random variables with well-known distributions. If 
my (t) is identical to one of these, say, the moment-generating function for a random 
variable V, then, by Theorem 6.1, U and V possess identical probability distribu- 
tions. The density functions, means, variances, and moment-generating functions for 
some frequently encountered random variables are presented in Appendix 2. We will 
illustrate the procedure with a few examples. 


EXAMPLE 6.10 


Suppose that Y is a normally distributed random variable with mean jz and variance 
o?. Show that 
Y-u 


Z= 
с 


has а standard normal distribution, a normal distribution with mean 0 and variance 1. 


Solution 
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We have seen in Example 4.16 that Y — и has moment-generating function e' aa 8. 


Hence, 
mz(t) = E(e'?) = E[e ^/9)(Y 29 = ma-m( =) = e/o D Z e. 
о 


On comparing mz(t) with the moment-generating function of a normal random vari- 
able, we see that Z must be normally distributed with E(Z) = 0 and V(Z) = 1. 
H 


EXAMPLE 6.11 


Solution 


Let Z bea normally distributed random variable with mean 0 and variance 1. Use the 
method of moment-generating functions to find the probability distribution of 22. 


The moment-generating function for Z? is 
oo -2n 


mz(t) = E(e7) = P. e^ Ро) а: = i et КУЯ 


со 1 5 
= / е-@/»@-2) gy 
2л 


This integral сап be evaluated either by consulting a table of integrals or by noting 
that, if 1 — 2t > 0 (equivalently, t < 1/2), the integrand 


Lyn] Gy] 


Ул 
is proportional to the density function of a normally ae random variable with 
mean 0 and variance (1 — 2t)~!. To make the integrand a normal density function (so 
that the definite integral is equal to 1), multiply the numerator and denominator by 
the standard deviation, (1 — 2r)~!/?. Then 


Е 1 d 1 z? E 
mz»(t)-— (1— 2012 L Jin — st (5 )/a 21) Je: 


Because the integral equals 1, if t — 1/2, 


= (1—2)1?. 


1 
mz»(t) = П = 201 


A comparison of mz:(t) with the moment-generating functions in Appendix 2 
shows that mz2(t) is identical to the moment-generating function for the gamma- 
distributed random variable with о = 1/2 and 6 = 2. Thus, using Definition 4.10, 
Z? has a x? distribution with v — 1 degree of freedom. It follows that the density 
function for U — Z? is given by 

u /2 e? 
cras. V20, 
fu(u) = 4 Г(1/2)2!? 
0, elsewhere. E 
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THEOREM 6.2 


Proof 


The method of moment-generating functions is often very useful for finding the 
distributions of sums of independent random variables. 


Let Y,,Y2,...,Y, be independent random variables with moment- 
generating functions my, (t), my,(t),..., my, (t), respectively. If U = Yı + 
Y,+---+ Yn, then 


my (t) = my, (t) x my, (t) 3& ooo X& my, (t). 


We know that, because the random variables Y;, Y>,..., Y, are independent 
(see Theorem 5.9), 
my (t) = pcd - E(e™ et A g”) 
= Be) x BE) cc x Bl). 
Thus, by the definition of moment-generating functions, 


my(t) = my, (t) x my, (t) X ооо xX тү, (ї). 


EXAMPLE 6.12 


Solution 


The number of customer arrivals at a checkout counter in a given interval of time 
possesses approximately a Poisson probability distribution (see Section 3.8). If Yı 
denotes the time until the first arrival, Yo denotes the time between the first and 
second arrival, ..., and Y, denotes the time between the (n — 1)st and nth arrival, 
then it can be shown that Y;, Y2,..., Y, are independent random variables, with the 
density function for Y; given by 


Y s 
е0, уг 0, 


Љу) = 4 0 
0, otherwise. 
[Because the Y;, fori = 1,2,...,n, are exponentially distributed, it follows that 


Е(Ү;) = 0; that is, 0 is the average time between arrivals.] Find the probability 
density function for the waiting time from the opening of the counter until the nth 
customer arrives. (If Y1, Y2,... denote successive interarrival times, we want the 
density function of U = Y; + Yo+---+ Yn.) 


To use Theorem 6.2, we must first know my,(t), i = 1, 2,..., п. Because each of 
the Y;'s is exponentially distributed with mean 0, my, (t) = (1 — 0t)! and, by 
Theorem 6.2, 
my (t) = my, (t) x my, (t) x --- x my,(t) 
—(-800!x(-800 ! x... x(1—87) ! = (1— өг)". 
This is the moment-generating function of a gamma-distributed random variable with 


a = n and f = Ө. Theorem 6.1 implies that U actually has this gamma distribution 
and therefore that 


1 die") и> 0 
fuu) = 1 Г(п)" | | 


0, elsewhere. П 
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The method of moment-generating functions can be used to establish some inter- 
esting and useful results about the distributions of functions of normally distributed 
random variables. Because these results will be used throughout Chapters 7—9, we 
present them in the form of theorems. 


THEOREM 6.3 Let Y;, Y2,..., Y, be independent normally distributed random variables with 
Е(Ү,) = ш and V(¥;) = o, fori = 1,2,...,n, and let a, a2, а, be 
constants. If 


n 
U = Y aY; = ай + a2¥o +--+ +an¥n, 
i=l 


then U is a normally distributed random variable with 


E(U) = Y ^ aiui = аш + Oa A 7 + Onttn 


i=l 


and 
n 
DO 2 299] 
V(U)- Dac. = ao, + azo; qr оо 42072072 
i=l 
Proof Because Y; is normally distributed with mean u; and variance or Y; has 


moment-generating function given by 


ot? 
my,(t) = exp (ui 4 ) А 


[Recall that exp(-) is а more convenient way to write e when the term in the 
exponent is long or complex.] Therefore, a; Y; has moment-generating function 
given by 


@ 2 
May, (t) = E(e'&i*) = my, (a;t) = exp (mar +— 5 ) А 


Because the random variables Y; are independent, the random variables а; Ү; 
are independent, fori = 1, 2,..., п, and Theorem 6.2 implies that 


my (t) = May, (t) x тау (t) XK oo9 7X m, y, (t) 


ао Gort 
= exp | ша + 2 хх ехр ат ar 2 


n t2 n 
= exp LXX aii + 5 uos А 
iz 


i=l 


Thus, U has a normal distribution with mean D ЖО. and variance 


Gl) 
a о 


THEOREM 6.4 Let Yi, Y2,..., Y, be defined as in Theorem 6.3 and define Z; by 


М =. 
а ie RES 
= 


1 
Mien Ж Z has a x? distribution with n degrees of freedom. 
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Proof 


6.37 


6.38 


6.39 


Because Y; is normally distributed with mean jz; and variance On the result of 


Example 6.10 implies that Z; is normally distributed with mean 0 and variance 1. 
From Example 6.11, we then have that Ze is а x?-distributed random variable 
with 1 degree of freedom. Thus, 


mz) = (1 20), 
and from Theorem 6.2, with V = У, 22, 
my(t) — ту (t) Ж т22(1) Xx Mz2(f) 
SS One A ae eee US Sa 


Because moment-generating functions are unique, V has a x? distribution with 
n degrees of freedom. 


Theorem 6.4 provides some clarification of the degrees of freedom associated with 
a x? distribution. If n independent, standard normal random variables are squared and 
added together, the resulting sum has a x? distribution with n degrees of freedom. 


Summary of the Moment-Generating Function Method 
Let U be a function of the random variables Y;, Y,..., Ү,. 


]. Find the moment-generating function for U, my (t). 
Compare my (t) with other well-known moment-generating functions. If 
my(t) = my(t) for all values of t, Theorem 6.1 implies that U and V 
have identical distributions. 


Exercises 


Let Yi, Y2,..., Y, be independent and identically distributed random variables such that for 
0< p <1, P(Y; = 1) = p and P(Y; = 0) = q = 1 — p. (Such random variables are called 
Bernoulli random variables.) 


a Find the moment-generating function for the Bernoulli random variable Y;. 
b Find the moment-generating function for W = Y; + Yo 4 --- + Y,. 
c Whatis the distribution of W? 


Let Y, and Y, be independent random variables with moment-generating functions my, (f) 
and ту, (t), respectively. If a, and а» are constants, апа U = аҮ + а У show that the 
moment-generating function for U is my (t) = my, (ait) x my, (aot). 


In Exercises 6.11 and 6.25, we considered two electronic components that operate indepen- 
dently, each with a life length governed by the exponential distribution with mean 1. Use the 
method of moment-generating functions to obtain the density function for the average life 
length of the two components. 


6.40 


6.41 


6.42 


6.43 


*6.44 


6.45 


6.46 


6.47 


6.48 
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Suppose that Y, and Y; are independent, standard normal random variables. Find the density 
function of U = Y? + Y3. 


Let Y;, Y2,..., Y, be independent, normal random variables, each with mean и and variance о?. 
Let а\, a2, ..., a, denote known constants. Find the density function of the linear combination 
U= bem ai Y;. 


A type of elevator has a maximum weight capacity Y; , which is normally distributed with mean 
5000 pounds and standard deviation 300 pounds. For a certain building equipped with this type 
of elevator, the elevator's load, Y», is a normally distributed random variable with mean 4000 
pounds and standard deviation 400 pounds. For any given time that the elevator is in use, find 
the probability that it will be overloaded, assuming that Y; and Y? are independent. 


Refer to Exercise 6.41. Let Yi, Y, ..., Y, be independent, normal random variables, each with 
mean и and variance o°. 


: $ В 2 1d 
a Find the density function of Y = — у Yi. 
т 


b Ifo? = 16 andn = 25, what is the probability that the sample mean, Y, takes оп a value 
that is within one unit of the population mean, u? That is, find P(|Y — u| < 1). 

с Ifc? = 16, find P(Y = щш < 1) if n = 36, n = 64, and n = 81. Interpret the results of 
your calculations. 


The weight (in pounds) of *medium-size" watermelons is normally distributed with mean 15 
and variance 4. A packing container for several melons has a nominal capacity of 140 pounds. 
What is the maximum number of melons that should be placed in a single packing container 
if the nominal weight limit is to be exceeded only 5% of the time? Give reasons for your 
answer. 


The manager of a construction job needs to figure prices carefully before submitting a bid. He 
also needs to account for uncertainty (variability) in the amounts of products he might need. 
To oversimplify the real situation, suppose that a project manager treats the amount of sand, in 
yards, needed for a construction project as a random variable Y; , which is normally distributed 
with mean 10 yards and standard deviation .5 yard. The amount of cement mix needed, in 
hundreds of pounds, is a random variable У, which is normally distributed with mean 4 and 
standard deviation .2. The sand costs $7 per yard, and the cement mix costs $3 per hundred 
pounds. Adding $100 for other costs, he computes his total cost to be 


U = 100 4 7Y; + 3Y». 


If Y; and Y; are independent, how much should the manager bid to ensure that the true costs 
will exceed the amount bid with a probability of only .01? Is the independence assumption 
reasonable here? 


Suppose that Y has a gamma distribution with a = n/2 for some positive integer п and В 
equal to some specified value. Use the method of moment-generating functions to show that 
W = 2Y/f has a x? distribution with n degrees of freedom. 


A random variable Y has a gamma distribution with о = 3.5 and В = 4.2. Use the result in 
Exercise 6.46 and the percentage points for the x? distributions given in Table 6, Appendix 3, 
to find P(Y > 33.627). 


In a missile-testing program, one random variable of interest is the distance between the point 
at which the missile lands and the center of the target at which the missile was aimed. If we 
think of the center of the target as the origin of a coordinate system, we can let Y; denote 
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6.49 


6.50 


6.51 


6.52 


6.53 


6.54 


6.55 


6.56 


6.57 


6.58 


the north-south distance between the landing point and the target center and let Y) denote the 
corresponding east-west distance. (Assume that north and east define positive directions.) The 
distance between the landing point and the target center is then U = y Y? + Y3. If Y, and Y; 
are independent, standard normal random variables, find the probability density function for U . 


Let Y; be a binomial random variable with n; trials and probability of success given by p. Let 
Y; be another binomial random variable with n, trials and probability of success also given 
by p. If Y, and Y; are independent, find the probability function of Y; + Y2. 


Let Y be a binomial random variable with п trials and probability of success given by p. Show 
that n — Y is a binomial random variable with л trials and probability of success given by 1 — p. 


Let Y; be a binomial random variable with n, trials and p; = .2 and Y; be an independent bino- 
mial random variable with n, trials and р» = .8. Find the probability function of Y; + n5 — Y2. 


Let Y, and Y, be independent Poisson random variables with means A, and A», respectively. 
Find the 

a probability function of Y, + Y2. 

b conditional probability function of Y;, given that Y; + Y = m. 


Let Y;, Y2,..., Y, be independent binomial random variable with п; trials and probability of 
success given by p;, i = 1, 2,...,п. 


a Ifall of the n;'s are equal and all of the p's are equal, find the distribution of » 7. , Y;. 
If all of the n;'s are different and all of the p's are equal, find the distribution of xa Yi. 
If all of the n;’s are different and all of the p's are equal, find the conditional distribution 
Y; given У , Y; = т. 

d If all of the n;'s are different and all of the p's are equal, find the conditional distribution 
Y; + Y; given 9 5 , Y; = т. 

e Ifall of the p's are different, does the method of moment-generating functions work well 
to find the distribution of 5 7 , Y;? Why? 


Let Yi, Y2,..., Y, be independent Poisson random variables with means A), A2,..., Аһ, 
respectively. Find the 


a probability function of 5 7 , Y;. 
b conditional probability function of Yi, given that ? ; Y; = m. 
с conditional probability function of Y; + Y2, given that ? ; , Y; = т. 


Customers arrive at a department store checkout counter according to a Poisson distribution 
with a mean of 7 per hour. In a given two-hour period, what is the probability that 20 or more 
customers will arrive at the counter? 


The length of time necessary to tune up a car is exponentially distributed with a mean of 
.5 hour. If two cars are waiting for a tune-up and the service times are independent, what is 
the probability that the total time for the two tune-ups will exceed 1.5 hours? [Hint: Recall the 
result of Example 6.12.] 


Let Y;, Y2,..., Y, be independent random variables such that each Y; has a gamma distribution 
with parameters o; and В. That is, the distributions of the Y’s might have different @’s, but all 
have the same value for В. Prove that U = Y, + Y; +---+ Y, has a gamma distribution with 
parameters oj + o +---+a, and f. 


We saw in Exercise 5.159 that the negative binomial random variable Y can be written as 
у == У Wi, where W;, И, ..., W, are independent geometric random variables with 
parameter p. 


6.59 


6.60 


6.61 
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6.6 
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a Use this fact to derive the moment-generating function for Y. 
b Use the moment-generating function to show that E(Y) = r/p and V(Y) = r(1— p)/ p°. 
c Find the conditional probability function for Wi, given that = № + W2+---+W, = m. 


Show that if Y, has a x? distribution with v, degrees of freedom and Y; has a x? distribution 
with v, degrees of freedom, then U = Y, + Y; has a x? distribution with v, + v; degrees of 
freedom, provided that Y; and Y, are independent. 


Suppose that W = Y, + Y; where Y, and У are independent. If W has a x? distribution with 
v degrees of freedom and W, has a x? distribution with vj < v degrees of freedom, show that 
Y; has a x? distribution with v — v degrees of freedom. 


Refer to Exercise 6.52. Suppose that W = Y; + Y; where Y, and Y, are independent. If W has 
a Poisson distribution with mean А and W; has a Poisson distribution with mean A, < А, show 
that Y; has a Poisson distribution with mean А — A. 


Let Y, and Y; be independent normal random variables, each with mean 0 and variance o’. 
Define U, = Y, + Y, and 05 = Y, — Y;. Show that U; and U are independent normal random 
variables, each with mean 0 and variance 202. [Hint: If (U; , U2) has a joint moment-generating 
function m(t, t2), then U; and U2 are independent if and only if m(t, 2) = mu, (t) mu, (t).] 


Multivariable Transformations Using 
Jacobians (Optional) 


If Y is a random variable with density function fy (y), the method of transformations 
(Section 6.4) can be used to find the density function for U = A(Y), provided that Л (у) 
is either increasing or decreasing for all y such that fy (y) > 0. If A(y) is increasing or 
decreasing for all y in the support of fy (у), the function Л (-) is one-to-one, and there 
is an inverse function, A^! (-) such that u = h7!( y). Further, the density function for 
U is given by 


dh-'(u) 
du i 


fuu) = fy(h™'(u)) | 


Suppose that Y; and Y? are jointly continuous random variables and that U; = Ү + Y; 
апа Uz = Y; — Y2. How can we find the joint density function of U; and Uz? 

For the rest of this section, we will write the joint density of Ү and Y, as 
fv, y, O1, y2). Extending the ideas of Section 6.4, the support of the joint density 
fv, v, 1, y2) is the set of all values of (yi, y2) such that fy, y, (yi, y2) > 0. 


The Bivariate Transformation Method 


Suppose that Ү and Y; are continuous random variables with joint density 
function fy, v, (ут, y2) and that for all (ут, y2), such that fy, y, (yi, y2) > 0, 


uy = (ут, y2) and ил = ho(y1, y2) 
is a One-to-one transformation from (yi, y2) to (u1, u2) with inverse 


yi hj (ии) and у = А5 (uy, u2). 
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If us (ит, их) and [me (u1, U2) have continuous partial derivatives with respect 
to иу and их and Jacobian 


ду! hj! 
ди ди 9h; ah;! ahs! any! 
J = det P f == а PD рр 
oh, oh, Ou, диг Ou, Ou» 
ди ди? 


then the joint density of U; and U2 is 
fu, u (ит, u2) = fy, v, (ВТ (ил, ио), h; (ui, u2)) 171, 


where |J | is the absolute value of the Jacobian. 


We will not prove this result, but it follows from calculus results used for change 
of variables in multiple integration. (Recall that sometimes double integrals are more 
easily calculated if we use polar coordinates instead of Euclidean coordinates; see 
Exercise 4.194.) The absolute value of the Jacobian, |J|, in the multivariate trans- 
formation is analogous to the quantity |dA ^! (u)/du| that is used when making the 
one-variable transformation U = h(Y). 

A word of caution is in order. Be sure that the bivariate transformation u; = 
hi(yi, y2), U2 = hoCyi, y2) is a one-to-one transformation for all (yi, y2) such that 
fv, v, От, y2) > 0. This step is easily overlooked. If the bivariate transformation is not 
one-to-one and this method is blindly applied, the resulting "density" function will 
not have the necessary properties of a valid density function. We illustrate the use of 
this method in the following examples. 


EXAMPLE 6.13 Let Y, and Y; be independent standard normal random variables. If U; = Yı + Y2 
and U2 = Y; — Y», both U; and U; are linear combinations of independent normally 
distributed random variables, and Theorem 6.3 implies that U; is normally distributed 
with mean 0 + 0 = 0 and variance 1 + 1 = 2. Similarly, U2 has a normal distribution 
with mean 0 and variance 2. What is the joint density of U; and U5? 


Solution The density functions for Y; and Y» are 


e ax 

ЛО) = Eo —0o < у < oo 
e 0/2xi 

hO) = —ОО < ур < оо, 


Jum 

and the independence of Y; and Y» implies that their joint density is 

e Q/Dxi-Q/2y 
2л 


In this case fy, y, (yi, y2) > О for all —оо < y; < oo and — оо < y2 < oo, and we 
are interested in the transformation 


fr xGQuy)- , 00 < ур < 00, —00 < y2 < бо. 


ир = у + y2 = А (у, y2) and m = у — y2 = № (у, y») 
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with inverse transformation 
yı = (ш + и2)/2 = hi! Qn, и) and у = (u; — u2)/2 = №5 (ш, u). 


Because 9h; ! /ди = 1/2, 0h, ! /0u5 = 1/2, Әһди = 1/2 and дһ /диз = —1/2, 
the Jacobian of this transformation is 
1/2 172 


J = zl 
1/2 —1/2 


| = (1/2)(—1/2) — (1/2)(1/2) = –1/2 
and the joint density of U; апа U2 is [with exp(-) = e] 


u и? 2 u1—u» 2 
exp|- 335 - 3( =i | 


—Оо < (и + и2)/2 < oo, 


Хо (ил, u2) = 


, 


2л 2 


—oo < (uj — u3)/2 < оо. 


A little algebra yields 
1 u +u Y. 1 u = u \? 1, 1, 
= uy U5 
2 2 2 2 4 4 


{(u1, из): –оо < (uy + из) /2 < оо, –оо < (Uy — из)/2 < oo] 


апа 


= {(и1, из): оо < Uy < oo, —oo < u» < co]. 


Finally, because 4л = М?2^/2х VIN 2л, 
e A e 5A 
WON DEEN ON DE; ; 


Notice that U; апа U are independent and normally distributed, both with mean 0 
and variance 2. The extra information provided by the joint distribution of U; and U2 
is that the two variables are independent! 


fuu; U1, U2) = OO < Hj < OO, —0O <и) < Оо. 


The multivariable transformation method is also useful if we are interested in 
a single function of Ү and Y,—say, U, = h(Y1, Y2). Because we have only one 
function of Ү and Y2, we can use the method of bivariate transformations to find 
the joint distribution of U, and another function U5 = А, (Ү;, Y2) and then find the 
desired marginal density of U; by integrating the joint density. Because we are really 
interested in only the distribution of U;, we would typically choose the other function 
U2 = (Ү, Y2) so that the bivariate transformation is easy to invert and the Jacobian 
is easy to work with. We illustrate this technique in the following example. 


EXAMPLE 6.14 


Let Y; and Y, be independent exponential random variables, both with mean f > 0. 
Find the density function of 
Y, 
U= : 
Y; + Y; 
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Solution 


The density functions for Y, and Y» are, again using exp(-) = еб), 


1 
LX uu , 0 , 
diss ge yi/B) < у 


0, otherwise, 
and 


1 
—ехр(— 52/8), 0 < ya, 


РО») 4 В 
0, otherwise. 
Their joint density is 
1 
expl- + у)/8], О<у,О<у», 
fanny) = 4 8°? ! : 
0, otherwise, 


because Y, and Y, are independent. 

In this case, Ју, y, (yi, у) > О for all (yi, y2) such that O < у, 0 < yo, and 
we are interested in the function U, = Yj/(Y; + Y2). If we consider the function 
uy = yi/Cy1 + y2), there are obviously many values for (yi, y2) that will give the 
same value for иј. Let us define 

u = —.— = (у, у) and и = yi +y hs»). 
yi + y2 
This choice of u» yields a convenient inverse transformation: 


yı = ииз =hy (ui, из) and у = u3(1 — и) = hy! (ил, u2). 


The Jacobian of this transformation is 


| = u2(1 — и) — (—u2) (u1) = из, 


и и 
J= ae | : ! 


—u, l-u 


and the joint density of U; and U2 is 


fu, v, (u1, U2) 

1 

git [u1u2 + u3(1 — u)]/B) uo], 0 uiuo, O < uo(1— и), 
0, otherwise. 


In this case, fy, v, (u1, u2) > Oif u; and из are such that O < ииз, O < u2(1 — u1). 
Notice that if 0 < иуиз, then 
0 < u(l — щш) = и = щи & О<ии <и © O0<u<l. 

If 0 <и <1, then 0 <u2(1 — и) implies that 0 < из. Therefore, ће region of sup- 
port for the joint density of U, and U2 is { (и, из): О <и < 1, O < u2}, and the 
joint density of U; and (7 is given by 

—ua/B 
— Ue ‚ 0«uj-«l,O0«dus, 
fu, un, u2) = 4 В? 
0, otherwise. 

Using Theorem 5.5 it is easily seen that U; and U2 are independent. The marginal 
densities of U, and U2 can be obtained by integrating the joint density derived earlier. 
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In Exercise 6.63 you will show that U, is uniformly distributed over (0, 1) and that 
U; has a gamma density with parameters о = 2 and f. L| 


The technique described in this section can be viewed to be a one-step version of 
the two-step process illustrated in Example 6.9. 

In Example 6.14, it was more difficult to find the region of support (where the joint 
density is positive) than it was to find the equation of the joint density function. As 
you will see in the next example and the exercises, this is often the case. 


EXAMPLE 6.15 


Solution 


In Example 6.9, we considered a random variables Y; апа Y»? with joint density 


function 
2(1—y), 0<y <1, 0<syz<l1, 
0, elsewhere, 


fan Oi y2) = | 


and were interested in U = ҮҮ. Find the probability density function for U by using 
the bivariate transformation method. 


In this case fy, y, yi, y2) > O for all (уу, ух), such that O < у < 1, 0 < у x 1, and 
we are interested in the function U2 = Y; Y2. If we consider the function из = уу», 
this function alone is not a one-to-one function of the variables (y1, y2). Consider 
ui = у = (у, y2) and ил = уу = (у, у). 
For this choice of uj, and O < y; < 1, 0 < yo < 1, the transformation from (y1, y2) 
to (u1, u2) is one-to-one and 
yı =u, =h; (u, шо) and у = из/и = hy (u, w). 


The Jacobian is 


1 
J = det] 8 | = 1(1/u1) — (—и»/и1)(0) = l/u. 
—из/ит 1/u 
The original variable of interest is U2 = ҮҮ, and the joint density of U; and 
U2 is 


2(1 — и) >, O<u <1, 0< u/u <1, 


1 
И] 
0, otherwise. 


fuv (u1, u2) = 


Because 
{(и1, u2): 0x u; < 1, O < u2/u; < 1} = {(u1, u2): O < uo < u; < J}, 
the joint density of U, and U; is 
1 
2(1 — —, O0zxu,xujxl, 
fu, u, (u1, u2) = ( nu Еи 
0, otherwise. 


This joint density is exactly the same as the joint density obtained in Example 6.9 if 
we identify the variables Y; апа U used in Example 6.9 with the variables U; and 
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U2, respectively, used here. With this identification, the marginal density of U» is 
precisely the density of U obtained in Example 6.9—that is, 


*6.63 


2(u—1nu3 — 1, O<u <1, 
fau) = 
0, elsewhere. | | 
If Yi, Yo, ..., Ү are jointly continuous random variables and 


Ш = hi(Yi, Y?,..., Ye), U2 = (ү, Yo,..., У), ..., Ur = ha (Yi, Yo,..., Ye), 
where the transformation 


Uy = (ут, yo у), ua = № (у, yos Ys s Uk = (Уп, Yas +++ у) 


is a one-to-one transformation from (yi, y2,..., ук) to (u1, U2,..., ик) with inverse 
Же | —1 
yı = ћу nsus, l.i Uk), уз = һу (U1, U2, ..., Uk) ees 
=] 
Ук = h; (ш, И2,..., ик), 
and A, (и, u2, ..., Uk), Ву Qu U2, --., UK), sss һу (и, uos... м) have contin- 
uous partial derivatives with respect to u1, U2,..., uy and Jacobian 
any! nj any! 
ди\ ди» дик 
9h; hj any! 
J=det| дщ Qu» дик | 40, 
Әһ! Әһ! ah;' 
ди\ Qu» дик 


then a result analogous to the one presented in this section can be used to find the 
joint density of U1, Uz, ..., Up. This requires the user to find the determinant of a 
k x k matrix, a skill that is not required in the rest of this text. For more details, see 
“References and Further Readings” at the end of the chapter. 


Exercises 


In Example 6.14, Y; and Y; were independent exponentially distributed random variables, both 
with mean В. We defined U, = Ү,/(Ү + Y2) and U5 = Y, + Y; and determined the joint 
density of (U;, Uz) to be 


—u2/B 


1 
— Ue 0 <u, <1, 0< u2, 


fu, uy Qi, мә) = | P? 
0, otherwise. 
a Show that U; is uniformly distributed over the interval (0, 1). 
b Show that U, has a gamma density with parameters œ = 2 and £. 
с Establish that U, and U are independent. 


*6.64 


6.65 


*6.66 


*6.67 
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Refer to Exercise 6.63 and Example 6.14. Suppose that Y, has a gamma distribution with 
parameters o, and £, that Y; is gamma distributed with parameters o» and £, and that Ү and 
Y, are independent. Let U, = Y, /(Y; + №) and U; = Y, + №. 
a Derive the joint density function for U, апа U3. 
Show that the marginal distribution of U, is a beta distribution with parameters œ; and o. 
Show that the marginal distribution of U is a gamma distribution with parameters œ = 
a, + œ and В. 
d Establish that U, and (7 are independent. 
Let Z, and Z, be independent standard normal random variables and U; = Z, and Uy = 
Zi + Za. 
a Derive the joint density of U, and U3. 
b Use Theorem 5.12 to give E(U;), E(U5), V(U;), V(U2), and Cov(U,, U2). 
с Are U, and U; independent? Why? 
d Refer to Section 5.10. Show that U, and U, have a bivariate normal distribution. Identify 
all the parameters of the appropriate bivariate normal distribution. 
Let (Yı, Y2) have joint density function fy, y, (yi, y2) and let U, = Y, + Y; and Uy = Y3. 
a Show that the joint density of (0/1, U2) is 
fu, u Qui, u2) = fy y, Qn — u2, u2). 
b Show that the marginal density function for U; is 
oo 
fu, (ш) = f fy, y, би = U2, из) dun. 
—oo 
с If Y, and Y» are independent, show that the marginal density function for U, is 
oo 
fu (ш) = f fy, Qu = U2) fy, (u2) dur. 
—00 
That is, that the density of Y, + Y» is the convolution of the densities fy, (-) and fy, (-) 
Let (Yı, Y2) have joint density function fy, y, (у, y?) and let U, = Y;/Y2 and U; = Y». 


a 


b 


с 


Show that the joint density of (U1, U2) is 


fu, u Qt, U2) = fr, y, (Uiu2, u2)ļu2]. 


Show that the marginal density function for U; is 


љи) = | fy, y, (U1 U2, из) |и2| ди». 


If У, and Y; are independent, show that the marginal density function for U, is 


љо) = f fy, (ииз) fy, (и2)|и2| dur. 


Let Y; and Y? have joint density function 


8уу, Ozyi-»zl 


frn (уз, y2) = | 


0, otherwise, 


and U, = Yi/ Yo and U2 = Y2. 
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a Derive the joint density function for (U;, U2). 
b Show that U, and U; are independent. 


*6.69 Тһе random variables Y; and Y» are independent, both with density 


1 
5, l<y, 


fy) = 


y 
0, otherwise. 


Y 
Let U, = ———К and U, = Y, + Р. 
Yı 2 


а What is the joint density of Ү and Y2? 
b Show that the joint density of U, and U; is given by 


1 l/u; < u2, O < u; < 1/2 and 
fuv Qu. из) = 4 i0 — uuu 1/0 — ш) < u, 1/2< u <1, 
0, otherwise. 


C Sketch the region where fy, v, (u1, и») > 0. 
d Show that the marginal density of U, is 


1 
——, 0x 1/2, 
ПЕ оне 
fu, Qu) = i. 1/2xu <1, 
2иї 
0, otherwise. 


e Are U; and U, are independent? Why or why not? 


*6.70 X Suppose that Y, and Y» are independent and that both are uniformly distributed on the interval 
(0, 1), and let U; = Ү, + № and U5 = Ү, = Yz. 


a Show that the joint density of U, and U, is given by 


1/2, =ш <и «uj, 0«u, <1 and 
fuiu Qu, u2) = uj—2-«u»«2-—uj, 1 uj <2, 
0, otherwise. 


b Sketch the region where Ди» Qt, U2) > 0. 
€ Show that the marginal density of 07, is 


Uy, О<и <l, 
Ал (ш) = }2-щ, lzuij <2, 
0, otherwise. 


d Show that the marginal density of U2 is 
1+0, —1<и›<0, 
fu(u2)211—-u, O<u, <l, 
0, otherwise. 


e Are U; and U? independent? Why or why not? 


*6.71 Suppose that Y, and Y» are independent exponentially distributed random variables, both with 
mean В, and define U; = Y; + Y) and U2 = Yi/Y;. 


6.7 
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a Show that the joint density of (U;, U2) is 


1 ув 1 
ше“! "E 
fuu Qu, иә) = 4 P? (1 + иә) 
0, otherwise. 


b Are U; and U2 are independent? Why? 


0<u,, 0 <и, 


Order Statistics 


Many functions of random variables of interest in practice depend on the relative 
magnitudes of the observed variables. For instance, we may be interested in the 
fastest time in an automobile race or the heaviest mouse among those fed on a certain 
diet. Thus, we often order observed random variables according to their magnitudes. 
The resulting ordered variables are called order statistics. 


Formally, let Y;, Y2,..., Y, denote independent continuous random variables with 
distribution function F (y) and density function f (y). We denote the ordered random 
variables Y; by Ya), Yo), ..., Yin), where Ya) < Yo) € ++- < Yin). (Because the ran- 


dom variables are continuous, the equality signs can be ignored.) Using this notation, 
Yay = min(Yi, Yo,..., Yn) 

is the minimum of the random variables Y;, and 
Yin) = max(Yi, Yo, ..., Yn) 


is the maximum of the random variables Y;. 

The probability density functions for Yq) and Y) can be found using the method 
of distribution functions. We will derive the density function of Ym) first. Because 
Yn) is the maximum of Y;, Y», ... , Yn, the event (Yu < y) will occur if and only if 
the events (Y; < y) occur for every i = 1,2, ...,n. That is, 


P(Yg ж y = P(Yy E y, Yo < у,...,Ү € y). 


Because the Y; are independent and P(Y; < у) = F(y) for i = 1,2,...,n, it 
follows that the distribution function of Y(,) is given by 


Fy,,(y) = PY) € y) = PM € y P(Y» € y) Р(Ү„ € y = [Е(у)]". 


Letting gin) (y) denote the density function of Yin), we see that, on taking derivatives 
of both sides, 


£0) = nE QI! £f). 


The density function for Ya) can be found in a similar manner. The distribution 
function of Y) is 


Fyra O) = P(Ya) < y) 21— Pa) > y). 


Because Yq) is the minimum of Y1, Y2, ..., Yn, it follows that the event (Ya) > y) 
occurs if and only if the events (Y; > y) occur fori = 1,2, ..., п. Because the Y; are 


334  Chapter6 Functions of Random Variables 


independent and P(Y; > y) = 1 — F(y) fori = 1,2,...,n, we see that 
Fy, (y) = P(Y < у) = 1— Pay > y) 
—1—P(Yi-y,Yo»y,...,Y,4 y) 
=1- [PO > у)Р(Ү» > y) «- PO, > y)] 
=1-[l-FQ)I’. 


Thus, if g(1) (y) denotes the density function of Yq), differentiation of both sides of 
the last expression yields 


£50) = n[1 — FQ)! РО). 


Let us now consider the case п = 2 and find the joint density function for Ya) 
and Yo. The event (Ya) < у, Үю) < y») means that either (Y; < у, Yo < y2) or 


(Yo € yi, Yı < y2). [Notice that Ya) could be either Ү or Y2, whichever is smaller.] 
Therefore, for y; < y», P(Yqa) < у, Ya) < yo) is equal to the probability of the 
union of the two events (Y; € у, Y2 < y2) and (Y? € yi, Yı < y»). That is, 


P(Yqa) € у, Yo, € y2) = PIM < y, Y < y2) U(Y < yi, Yi € y2)]. 
Using the additive law of probability and recalling that y; < y», we see that 
PY S у, Yo < y2) = PM S у, Yo < y) + PY S у, Yi S у) 
=P) < y, Xs»). 


Because Y; and Y; are independent and P(Y; < w) = F(w), fori = 1, 2, it follows 
that, for y; < y2, 


P(Ya) € у, Yo; < у) = FOV FO2) + FODE) — FOOD FO) 
= 2F() FO») - Fowl: 

If у, > y» (recall that Ya) < Yo), 

P(Yay € yi, Yo € у) = P (Yo) € у, Yo € y») 

= P; < у, № < уз) = [FOD]. 

Summarizing, the joint distribution function of Yq) and Yo) is 
2Е(у)Е(у) [FOD у < у», 
ГЕО), У > уз. 


Letting 9(1)(2)Q1, y2) denote the joint density of Ya) and Yo), we see that, on 
differentiating first with respect to y» and then with respect to у, 


2301) РОЈ), у < yo, 
0, elsewhere. 


Fy унуп, У2) = | 


gooi y2) = | 


The same method can be used to find the joint density of Ya), Yo), ... , Yin), which 
turns out to be 
n'fiyDf(o..... fn nz» << ул, 


8(1)0)---(п) (Yis yn... Yn) = | 0, elsewhere. 
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The marginal density function for any of the order statistics can be found from this 
joint density function, but we will not pursue this matter formally in this text. 


EXAMPLE 6.16 


Solution 


Electronic components of a certain type have a length of life Y, with probability 
density given by 

(1/100)е7›/!9%0, у> 0, 

0, elsewhere. 


= | 


(Length of life is measured in hours.) Suppose that two such components operate 
independently and in series in a certain system (hence, the system fails when either 
component fails). Find the density function for X, the length of life of the system. 


Because the system fails at the first component failure, X = min(Y;, Y?), where 
Y, and Y, are independent random variables with the given density. Then, because 
F(y)=1— e Y/100 for у> 0, 
fxQ) = 80) = п[1— FO) f) 
Е 2е—У/100(1 /100)е 2710 у> 0, 
Е | 0, elsewhere, 
and it follows that 
(1/50)e/9, у > 0, 
0, elsewhere. 


(у) = | 


Thus, the minimum of two exponentially distributed random variables has ап ех- 
ponential distribution. Notice that the mean length of life for each component is 
100 hours, whereas the mean length of life for the system is E(X) = E(Y(5) = 50 = 
100/2. П 


EXAMPLE 6.17 


Solution 


Suppose that the components in Example 6.16 operate in parallel (hence, the system 
does not fail until both components fail). Find the density function for X, the length 
of life of the system. 


Now X = max(Y}, Y2), and 


fx) = £260) = al FO)" fo» 
Е p —e7/100(1/100)e710 y > 0, 


0, elsewhere, 
and, therefore, 


ко» - | (1/50) (e-»/100 c e »/50). у> 0, 
` Я elsewhere. 


We see here that the maximum of two exponential random variables is not an expo- 
nential random variable. Bd 
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THEOREM 6.5 


Although a rigorous derivation of the density function of the kth-order statistic 
(k an integer, 1 < k <n) is somewhat complicated, the resulting density function has 
an intuitively sensible structure. Once that structure is understood, the density can 
be written down with little difficulty. Think of the density function of a continuous 
random variable at a particular point as being proportional to the probability that the 
variable is “close” to that point. That is, if Y is a continuous random variable with 
density function f (y), then 


P(yzYzycdy)* f(y)dy. 


Now consider the kth-order statistic, Yt. If the kth-largest value is near у, then k — 1 
of the Y's must be less than yz, one of the Y's must be near yg, and the remaining n — k 
of the Y's must be larger than yx. Recall the multinomial distribution, Section 5.9. In 
the present case, we have three classes of values of Y: 


Class 1: Y's that have values less than y, need k — 1. 
Class 2: Y's that have values near y; need 1. 
Class 3: Y's that have values larger than y, need n — К. 


The probabilities of each of these classes are, respectively, pj = P(Y < yj) = 
F(y), ро = POk SY S yetdyo © f Ok)dyk, and рз = PO > ур) = 1— F (yx). 
Using the multinomial probabilities discussed earlier, we see that 

P(yk < Yæ) < ук + дур) 


x P[(k — 1) from class 1, 1 from class 2, (n — К) from class 3] 


ЫЗ п k-1 „1 ,n—k 
on 1 nl P2 Рз 


п! 


А E-D k! 


EOV! РО) дук 1 — Fowl} 


апа 


n! 


(К — D! It (n 0) 


8 О) dy © РЕК OD fo) 1 = FOj)I"* дуу. 


The density of the kth-order statistic and the joint density of two-order statistics are 


given in the following theorem. 


Let Y1, ..., Y, be independent identically distributed continuous random vari- 
ables with common distribution function F(y) and common density function 
f C). If Yo; denotes the kth-order statistic, then the density function of Y) is 
given by 


[Fox ! [1 ЕО РО), 


—00 < yy < Оо. 


n! 
$000 = трго! 
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If j and k are two integers such that 1 < j < k < n, the joint density of Y(;) and 
Y is given by 

n! 
(gj —0D!(k—1- j)! (n — 0)! 
x [FO - FOI x [1 — РО) fon foo. 


=@9 «€ i < у < G9. 


[FOr 


EOV Ye) = 


The heuristic, intuitive derivation of the joint density given in Theorem 6.5 is 
similar to that given earlier for the density of a single order statistic. For y; < yg, the 
joint density can be interpreted as the probability that the jth largest observation is 
close to y; and the kth largest is close to yg. Define five classes of values of У: 


Class 1: Y's that have values less than y; need j — 1. 

Class 2: Y's that have values near y; need 1. 

Class 3: Y's that have values between y; and y, need k — 1 — j. 
Class 4: Y's that have values near y; need 1. 

Class 5: Y's that have values larger than у; need n — К. 


Again, use the multinomial distribution to complete the heuristic argument. 


EXAMPLE 6.18 


Solution 


Suppose that Y1, Yo, ..., Ys denotes a random sample from a uniform distribution 
defined on the interval (0, 1). That is, 
1, 0О<у<1, 
0) = | 0, elsewhere. 


Find the density function for the second-order statistic. Also, give the joint density 
function for the second- and fourth-order statistics. 


The distribution function associated with each of the Y's is 


0, у<0, 
Е(у) = yy, 0О<у<1, 
І, youl. 


The density function of the second-order statistic, Үс), can be obtained directly from 
Theorem 6.5 with n = 5, k = 2. Thus, with f (y) and F(y) as noted, 


[FP ![1 – FOr) P* (р), —-oo < у < oo, 


! 
£202) = 2-1!6-2) 


_ | к= О<у<1, 
0, elsewhere. 


The preceding density is a beta density with a = 2 and 6 = 4. In general, the kth- 
order statistic based on a sample of size n from a uniform (0, 1) distribution has a 
beta density witha = k and  — n — k + 1. 
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The joint density of the second- and fourth-order statistics is readily obtained from 
the second result in Theorem 6.5. With f(y) and F(y) as before, j = 2, k = 4, and 
n=5, 

5! 
2—0D!(4—1—2)!(5—4)! 
х[1— FOI fo» fO). 00 < x < y4 < oo 


| [5!»104 — yn — ya), 0zy«yzl 
0, elsewhere. 


[F(2) PL F(a) — FQ) 


8(2)(4) (V2, Ул) = 


Of course, this joint density can be used to evaluate joint probabilities about Yo) and 
Үүд) or to evaluate the expected value of functions of these two variables. [| 


6.72 


6.73 


6.74 


6.75 


*6.76 


*6.77 


Exercises 


Let Y; and Y, be independent and uniformly distributed over the interval (0, 1). Find 


a the probability density function of U, = min(Y;, Y2). 

b E(U;)andV (M4). 

As in Exercise 6.72, let Ү and Y, be independent and uniformly distributed over the interval 
(0, 1). Find 

a the probability density function of Uz = max(Y;, Y2). 

b E (U5) and V (U2). 


Let Y;, Y2, ..., Y, be independent, uniformly distributed random variables on the interval [0, 0]. 
Find the 


a probability distribution function of Ya) = тах(У|, Yo,..., Yp). 
b density function of Y,,;. 


c mean and variance of Yo). 


Refer to Exercise 6.74. Suppose that the number of minutes that you need to wait for a bus 
is uniformly distributed on the interval [0, 15]. If you take the bus five times, what is the 
probability that your longest wait is less than 10 minutes? 


Lety is Yrru Y, be independent, uniformly distributed random variables on the interval [0, 0]. 
a Find the density function of Yq), the kth-order statistic, where k is an integer between 1 
and n. 

Use the result from part (a) to find E (Yw). 

Find V (Yœ). 


d Use the result from part (c) to find Е (Усу — Ya). the mean difference between two 
successive order statistics. Interpret this result. 


a 


Ге Yis Y, с: Y, be independent, uniformly distributed random variables on the interval [0, 0]. 


a Find the joint density function of У; and Yœ) where j and k are integers 1 < j < k <n. 


b Use the result from part (a) to find Cov(Y,;;, Y) when j and k are integers 1 < j < k <n. 
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6.79 


6.80 


6.81 


6.82 


6.83 


6.84 


6.85 
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C Use the result from part (b) and Exercise 6.76 to find V (Yœ) — Y,j), the variance of the 
difference between two order statistics. 


Refer to Exercise 6.76. If Y;, Y2, ... , Y, are independent, uniformly distributed random vari- 
ables on the interval [0, 1], show that Yœ, the kth-order statistic, has a beta density function 
witha = k and B =n —k + 1. 


Refer to Exercise 6.77. If Y;, Y2,..., Y, are independent, uniformly distributed random vari- 
ables on the interval [0, 0], show that U = Ya)/ Yn) and Yo) are independent. 


Let Y;, Y2, ..., Y, be independent random variables, each with a beta distribution, with a = 
p = 2. Find 
a the probability distribution function of Y,,, = max(Yi, У,..., Yn). 


b the density function of Yg). 
с Е(Үһ) when n = 2. 


Let Yı, Yo,..., Y, be independent, exponentially distributed random variables with mean f. 


a Show that Ya) = min(Y,, Y?, ..., Y) has an exponential distribution, with mean В/л. 
b Ifn —5and f = 2, find P(Y, < 3.6). 


If Y is a continuous random variable and т is the median of the distribution, then m is such 
that P(Y < т) = P(Y > m) = 1/2. If Yi, Y?, ..., Y, are independent, exponentially dis- 
tributed random variables with mean В and median т, Example 6.17 implies that Yu; = 
max(Yi, №,..., Y„) does not have an exponential distribution. Use the general form of Ру, (y) 
to show that P (Ya) > т) = 1 — (.5)". 


Refer to Exercise 6.82. If Yi, Yo,..., Y, is a random sample from any continuous distribution 
with mean m, what is P (Ym) > m)? 


Refer to Exercise 6.26. The Weibull density function is given by 


EN 
—my" le y fa. 


Оо) = у= 


0, elsewhere, 


у> 0, 


where œ апа m are positive constants. If a random sample of size п is taken from a Weibull 
distributed population, find the distribution function and density function for Ya) = min(Y;, 
Y2, ..., Yn). Does Ya) = have a Weibull distribution? 


Let Y, and Y; be independent and uniformly distributed over the interval (0, 1). Find 
PY, < Yo). 


Let Y;, Y2,..., Y, be independent, exponentially distributed random variables with mean f. 
Give the 


a density function for Yœ), the kth-order statistic, where k is an integer between | and n. 


b joint density function for Y,;; and Yq) where j and К are integers 1 € j < k <n. 


The opening prices per share Y, and Y? of two similar stocks are independent random variables, 
each with a density function given by 


| (1/2)e-0/260-9, yz 4, 
; elsewhere. 


fy) = 


Оп a given morning, an investor is going to buy shares of whichever stock is less expensive. 
Find the 
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6.88 


*6.89 


*6.90 


*6.91 


a probability density function for the price per share that the investor will pay. 


b expected cost per share that the investor will pay. 


Suppose that the length of time Y it takes a worker to complete a certain task has the probability 
density function given by 


e 079, y>9, 
ЈО) = | А 

0, elsewhere, 
where Ө is a positive constant that represents the minimum time until task completion. Let 
Yi, Y,..., Y, denote a random sample of completion times from this distribution. Find 
a the density function for Ya) = min(Y, Yo,..., Yn). 
b E(Yo). 
Let Y;, Y?, ..., Y, denote a random sample from the uniform distribution f (y) = 1,0 < y < 1. 


Find the probability density function for the range R = Ym) — Ya). 


Suppose that the number of occurrences of a certain event in time interval (0, г) has a Poisson 
distribution. If we know that n such events have occurred in (0, г), then the actual times, 
measured from 0, for the occurrences of the event in question form an ordered set of random 
variables, which we denote by Wa) < Wo, <--- € Wo. [Wo actually is the waiting time 
from 0 until the occurrence of the ith event.] It can be shown that the joint density function for 
Way, Woj, ..., Won) is given by 


f(W wa... Wa) = + tn 


0, elsewhere. 


[This is the density function for an ordered sample of size n from a uniform distribution on 
the interval (0, £).] Suppose that telephone calls coming into a switchboard follow a Poisson 
distribution with a mean of ten calls per minute. A slow period of two minutes’ duration had 
only four calls. Find the 


a probability that all four calls came in during the first minute; that is, find P(Wy) < 1). 


b expected waiting time from the start of the two-minute period until the fourth call. 


Suppose that n electronic components, each having an exponentially distributed length of life 
with mean 6, are put into operation at the same time. The components operate independently 
and are observed until r have failed (r < n). Let W; denote the length of time until the jth 
failure, with W, < И <--- < W,. Let T; = W; — Wi. for j > 2 and T; = W;. Notice that 
T; measures the time elapsed between successive failures. 


a Show that T;, for j = 1,2, ..., ғ, has an exponential distribution with mean 0/(n — j + 1). 
b Show that 


U, = Э W; + (п – г), = 2. = ј + 10)7; 
J= J= 


and hence that E(U,) = r0. [U, is called the total observed life, and we can use U,/r as 
an approximation to (or "estimator" of ) 0.] 


6.8 


6.92 


6.93 
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Summary 


This chapter has been concerned with finding probability distributions for functions 
of random variables. This is an important problem in statistics because estimators 
of population parameters are functions of random variables. Hence, it is necessary 
to know something about the probability distributions of these functions (or estima- 
tors) in order to evaluate the goodness of our statistical procedures. A discussion of 
estimation will be presented in Chapters 8 and 9. 

The methods for finding probability distributions for functions of random variables 
are the distribution function method (Section 6.3), the transformation method (Section 
6.4), and the moment-generating-function method (Section 6.5). It should be noted 
that no particular method is best for all situations because the method of solution 
depends a great deal upon the nature of the function involved. If U; and U2 are two 
functions of the continuous random variables Y; and У, the joint density function 
for U; and U2 can be found using the Jacobian technique in Section 6.6. Facility for 
handling these methods can be achieved only through practice. The exercises at the 
end of each section and at the end of the chapter provide a good starting point. 

The density functions of order statistics were presented in Section 6.7. 

Some special functions of random variables that are particularly useful in statistical 
inference will be considered in Chapter 7. 
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Supplementary Exercises 


If Y, and Y» are independent and identically distributed normal random variables with mean jz 
and variance o, find the probability density function for U = (1/2)(Y, — ЗҮ). 


When current / flows through resistance R, the power generated is given by W = I? А. Suppose 
that / has a uniform distribution over the interval (0, 1) and R has a density function given by 


2r, O<r <1, 
o=] 


Find the probability density function for W. (Assume that / is independent of R.) 


0, elsewhere. 
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Two efficiency experts take independent measurements Y, and Y; on the length of time workers 
take to complete a certain task. Each measurement is assumed to have the density function 
given by 
(1/4)ye??, у> 0, 
fo)= | 
ү elsewhere. 


Find the density function for the average U = (1/2)(¥; + У). [Hint: Use the method of 
moment-generating functions.] 


Let Y, and У be independent and uniformly distributed over the interval (0, 1). Find the 
probability density function of each of the following: 


a О = Yi/ Ys. 
b 0 = —]n (YiY;). 
с U3 = ҮҮ. 


Suppose that Y, is normally distributed with mean 5 and variance 1 and Y is normally distributed 
with mean 4 and variance 3. If Y, and Y; are independent, what is P(Y; > Y2)? 


Suppose that Y; is a binomial random variable with four trials and success probability .2 and 
that Y» is an independent binomial random variable with three trials and success probability 
5. Let = Ү + Y2. According to Exercise 6.53(e), W does not have a binomial distribution. 
Find the probability mass function for W. [Hint: P(W = 0) = P(Y, = 0, Y = 0); P(W = 
1) = P = 1, Yo = 0) + P(Y = 0, № = 1); etc] 


The length of time that a machine operates without failure is denoted by Y; and the length of 

time to repair a failure is denoted by Y». After a repair is made, the machine is assumed to 
operate like а new machine. Y, and Y? are independent and each has the density function 

е”, y-0, 

f(y) = | 


0, elsewhere. 


Find the probability density function for U = Y,/(Y, + Y2), the proportion of time that the 
machine is in operation during any one operation-repair cycle. 


Refer to Exercise 6.98. Show that U , the proportion of time that the machine is operating during 
апу one operation-repair cycle, is independent of Y, + Y2, the length of the cycle. 


The time until failure of an electronic device has an exponential distribution with mean 15 
months. If a random sample of five such devices are tested, what is the probability that the first 
failure among the five devices occurs 


a after 9 months? 
b before 12 months? 


A parachutist wants to land at a target Т, but she finds that she is equally likely to land at 
any point on a straight line (A, B), of which T is the midpoint. Find the probability density 
function of the distance between her landing point and the target. [Hint: Denote A by —1, B by 
+1, and T by 0. Then the parachutist's landing point has a coordinate X, which is uniformly 
distributed between —1 and 4-1. The distance between X and T is |X |.] 


Two sentries are sent to patrol a road 1 mile long. The sentries are sent to points chosen 
independently and at random along the road. Find the probability that the sentries will be less 
than 1/2 mile apart when they reach their assigned posts. 


Let Y, and Y, be independent, standard normal random variables. Find the probability density 
function of = Yi/Y». 
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Let У, and Y, be independent random variables, each having the same geometric distribution. 


a Find P(Y, = Y2) = P(Y; — Y; = 0). [Hint: Your answer will involve evaluating an infinite 
geometric series. The results in Appendix A1.11 will be useful.] 
b Find P(Y, — Y = 1). 
*c [fU = Y; — Y», find the (discrete) probability function for U. [Hint: Part (a) gives P(U = 
0), and part (b) gives P(U — 1). Consider the positive and negative integer values for U 
separately.] 


A random variable Y has a beta distribution of the second kind, if, for a > 0 and В > 0, its 
density is 


a—l 


y 
fvO) = 4 Bia, B)(1 + yy" 
0, elsewhere. 


Derive the density function of U — 1/(1 4- Y). 


у> 0, 


If Y is a continuous random variable with distribution function F(y), find the probability 
density function of U = F(Y). 


Let Y be uniformly distributed over the interval (—1, 3). Find the probability density function 
of U — Y?. 


If Y denotes the length of life of a component and F (y) is the distribution function of Y, then 
P(Y > y = 1— F(y)iscalled the reliability of the component. Suppose that a system consists 
of four components with identical reliability functions, 1 — F(y), operating as indicated in 
Figure 6.10. The system operates correctly if an unbroken chain of components is in operation 
between A and B. If the four components operate independently, find the reliability of the 
system in terms of F (y). 


The percentage of alcohol in a certain compound is a random variable Y, with the following 
density function: 


3 — 
йй= [9 (1—y) O<y<l 


b otherwise. 
Suppose that the compound's selling price depends on its alcohol content. Specifically, if 
1/3 « y « 2/3, the compound sells for C, dollars per gallon; otherwise, it sells for C» dollars 


per gallon. If the production cost is Сз dollars per gallon, find the probability distribution of 
the profit per gallon. 
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An engineer has observed that the gap times between vehicles passing a certain point on a 
highway have an exponential distribution with mean 10 seconds. Find the 


a probability that the next gap observed will be no longer than one minute. 


b probability density function for the sum of the next four gap times to be observed. What 
assumptions are necessary for this answer to be correct? 


If a random variable U is normally distributed with mean jz and variance o? and У = e" 
[equivalently, U — In(Y)], then Y is said to have a /og-normal distribution. The log-normal 
distribution is often used in the biological and physical sciences to model sizes, by volume or 
weight, of various quantities, such as crushed coal particles, bacteria colonies, and individual 
animals. Let U and Y be as stated. Show that 


a the density function for Y is 


1 "E 
e n »y-uy/Qa^) > 0, 
ҒО) = (==) » 


0, elsewhere. 


b E(Y)  e** 6/2 and V(Y) = e+?’ (e° — 1). [Hint: Recall that E(Y) = E(e") and 
E(Y?) = E(e?"), where U is normally distributed with mean и and variance o?. Recall 
that the moment-generating function of U is my (t) = e'".] 


If a random variable U has a gamma distribution with parameters о > 0 and £ > 0, then 
Y = e” [equivalently, U = In(Y)] is said to have a log-gamma distribution. The log-gamma 
distribution is used by actuaries as part of an important model for the distribution of insurance 
claims. Let U and Y be as stated. 


a Show that the density function for Y is 


1 
ХО) = Б 
0, elsewhere. 
If 8 < 1, show that E(Y) = (1 — В) *. [See the hint for part (c).] 
If B < .5, show that V (Y) = (1 — 28)-* — (1 — B)~**. [Hint: Recall that E(Y) = E (eY) 
and E(Y?) = E(e?"), where U is gamma distributed with parameters o > 0 and f > 0, 


and that the moment-generating function of a gamma-distributed random variable only 
exists if t < B~'; see Example 4.13.] 


еа, у> 1, 


Let (Yı, Y2) have joint density function fy, y, (yi, y2) and let U, = Yı Y) and (5 = Y». 


a Show that the joint density of (U;, U2) is 


ui 1 
fu, uy Qui, U2) = fry, m” : 
2 


ГА 


b Show that the marginal density function for U is 


99 uy 1 
fu, (i) = fn» — из | —— du». 
Ш uz [u»| 


с If Y, and У, are independent, show that the marginal density function for U, is 


9S uy 1 
fu, Qu) =| Л, (2) fy, (и2) — dua. 


ГА 
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A machine produces spherical containers whose radii vary according to the probability density 
function given by 


2r, O<r<l, 


0, elsewhere. 


w=] 
Find the probability density function for the volume of the containers. 


Let v denote the volume of a three-dimensional figure. Let Y denote the number of particles 
observed in volume v and assume that Y has a Poisson distribution with mean Av. The particles 
might represent pollution particles in air, bacteria in water, or stars in the heavens. 


a Ifa point is chosen at random within the volume v, show that the distance R to the nearest 
particle has the probability density function given by 


Adan 26-43 > 0, 
jupe t : 


0, elsewhere. 
b TfR is as in part (a), show that U = R° has an exponential distribution. 
Let (Yı, Y2) have joint density function fy, y, (yi, y2) and let U, = Y; — Y, and U; = У. 
a Show that the joint density of (07, U2) is 
fui uy Qi, U2) = fy, y, (U1 + из, из). 


b Show that the marginal density function for U; is 


oo 
fu, (ш) =] fy, y, Qt + U2, Uz) дил. 
—oo 


с If Y, and Y are independent, show that the marginal density function for U; is 


fu) f fy, (ит + u2) fy, (u2) ди». 
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Introduction 


In Chapter6, we presented methods for finding the distributions of functions of random 
variables. Throughout this chapter, we will be working with functions of the variables 
Yi, Ү,..., Y, observed in a random sample selected from a population of interest. 
As discussed in Chapter 6, the random variables Yi, Y2,..., Y, are independent and 
have the same distribution. Certain functions of the random variables observed in a 
sample are used to estimate or make decisions about unknown population parameters. 

For example, suppose that we want to estimate a population mean jz. If we obtain 
a random sample of n observations, y1, yo, ..., Yn, it seems reasonable to estimate 
ш with the sample mean 


y= 


Sle 


n 
» Уг. 

irem 

The goodness of this estimate depends on the behavior of the random variables 
Yi, Y2,..., Y, and the effect that this behavior has on Y = (1/n) У, Y;. Notice 
that the random variable Y is a function of (only) the random variables Y1, Yo, ..., Y, 
and the (constant) sample size n. The random variable Y is therefore an example of 
a statistic. 


DEFINITION 7.1 
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A statistic is a function of the observable random variables in a sample and 
known constants. 


You have already encountered many statistics, the sample mean Y, the sample 
variance $?, Yn) = max(Y;, Yo,..., Yn), Yu; = min(Yi, Yo,..., Yn), the range 
К = Yo — Ya), the sample median, and so on. Statistics are used to make inferences 
(estimates or decisions) about unknown population parameters. Because all statistics 
are functions of the random variables observed in a sample, all statistics are random 
variables. Consequently, all statistics have probability distributions, which we will call 
their sampling distributions. From a practical point of view, the sampling distribution 
of a statistic provides a theoretical model for the relative frequency histogram of the 
possible values of the statistic that we would observe through repeated sampling. 

The next example provides a sampling distribution of the sample mean when 
sampling from a familiar population, the one associated with tossing a balanced die. 


EXAMPLE 7.1 


Solution 


A balanced die is tossed three times. Let Y, Y2, апа Уз denote the number of spots 
observed on the upper face for tosses 1, 2, and 3, respectively. Suppose we are inter- 
ested in Y = (Y + № + Y3)/3, the average number of spots observed in a sample of 
size 3. What are the mean, иу, and standard deviation, oy, of Y? How can we find the 
sampling distribution of Y? 


In Exercise 3.22, you showed that u = E(Y;) = 3.5 and c? = У(Ү;) = 2.67,i = 
1, 2, 3. Since Yi, Y) and У; are independent random variables, the result derived in 
Example 5.27 (using Theorem 5.12) implies that 

ОЕ = o? 2.9167 
EY) = = 3.5; V(Y) = — = — 

3 3 

How can we derive the distribution of the random variable Y? The possible values 
of the random variable W = Y, + Y + Уз аге 3, 4, 5,..., 18 and Y = W/3. Because 
the die is balanced, each of the 6? = 216 distinct values of the multivariate random 
variable (Y1, Y2, Үз) are equally likely and 


Р(Ү, = у, № = yo, Y = уз) = pO, у, уз) = 1/216, 
an oe deri 5 


= 9122: oy = У .9722 = .9860. 


Therefore, 
P(Y = 1) = P(W = 3) = p(1, 1, 1) = 1/216 
P(Y = 4/3) = P(W = 4) = p(1, 1,2) + pd, 2, 1) + pQ, 1, 1) = 3/216 
P(Y = 5/3) = P(W = 5) = p(1, 1,3) + p(1,3, 1) + pG, 1, 1) 
+ р(1, 2,2) + p(2, 1,2) + р(2, 2, 1) = 6/216 


The probabilities P(Y = i/3), i = 7, 8,..., 18 are obtained similarly. H 
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FIGURE 7.1 

(a) Simulated 
sampling distribution 
for Y, Example 7.1; 
(b) mean and 
standard deviation of 
the 4000 simulated 
values of Y 


Frequency Number of Rolls = 4000 
516 г 


387 — 


258 - 


129 — 


0 | | | | 
1 2 3 4 5 6 


Mean of 3 Dice 
(a) 


Pop Prob: (1) 0.167 (2) 0.167 (3) 0.167 (4) 0.167 (5) 0.167 (6) 0.167 
Population: Mean = 3.500 StDev = 1.708 

Samples = 4000 of size 3 

Mean = 3.495 

StDev = 0.981 

+/— 1 StDev: 0.683 

+/— 2 StDev: 0.962 

+/— 3 StDev: 1.000 


(b) 


The derivation of the sampling distribution of the random variable Y sketched in 
Example 7.1 utilizes the sample point approach that was introduced in Chapter 2. 
Although it is not difficult to complete the calculations in Example 7.1 and give 
the exact sampling distribution for Y, the process is tedious. How can we get an 
idea about the shape of this sampling distribution without going to the bother of 
completing these calculations? One way is to simulate the sampling distribution by 
taking repeated independent samples each of size 3, computing the observed value y 
for each sample, and constructing a histogram of these observed values. The result 
of one such simulation is given in Figure 7.1(a), a plot obtained using the applet 
DiceSample (accessible at www.thomsonedu.com/statistics/wackerly). 

What do you observe in Figure 7.1(a)? As predicted, the maximum observed value 
of Y is 6, and the minimum value is 1. Also, the values obtained in the simulation 
accumulate in a mound-shaped manner approximately centered on 3.5, the theoret- 
ical mean of Y. In Figure 7.1(b), we see that the average and standard deviation of 
the 4000 simulated values of Y are very close to the theoretical values obtained in 
Example 7.1. 
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Some of the exercises at the end of this section use the applet DiceSample to explore 
the simulated sampling distribution of Y for different sample sizes and for die tosses 
involving loaded dice. Other applets are used to simulate the sampling distributions 
for the mean and variance of samples taken from a mound-shaped distribution. 

Like the simulated sampling distributions that you will observe in the exercises, 
the form of the theoretical sampling distribution of any statistic will depend upon the 
distribution of the observable random variables in the sample. In the next section, 
we will use the methods of Chapter 6 to derive the sampling distributions for some 
statistics used to make inferences about the parameters of a normal distribution. 


Exercises 


Applet Exercise In Example 7.1, we derived the mean and variance of the random variable 
Y based on a sample of size 3 from a familiar population, the one associated with tossing a 
balanced die. Recall that if Y denotes the number of spots observed on the upper face on a 
single toss of a balanced die, as in Exercise 3.22, 


P(Y-i)-1/6, i=1,2,...,6, 
b= Е(Ү)=3.5, 
Var(Y) = 2.9167. 


Use the applet DiceSample (at www.thomsonedu.com/statistics/wackerly) to complete the fol- 
lowing. 


a Use the button “Roll One Set" to take a sample of size 3 from the die-tossing population. 
What value did you obtain for the mean of this sample? Where does this value fall on the 
histogram? Is the value that you obtained equal to one of the possible values associated 
with a single toss of a balanced die? Why or why not? 


b Use the button “Roll One Set" again to obtain another sample of size 3 from the die-tossing 
population. What value did you obtain for the mean of this new sample? Is the value that 
you obtained equal to the value you obtained in part (a)? Why or why not? 

с Use the button “Roll One Set" eight more times to obtain a total of ten values of the sample 
mean. Look at the histogram of these ten means. What do you observe? How many different 
values for the sample mean did you obtain? Were any values observed more than once? 


d Use the button “Roll 10 Sets" until you have obtained and plotted 100 realized values for 
the sample mean, Y. What do you observe about the shape of the histogram of the 100 
realized values? Click on the button "Show Stats" to see the mean and standard deviation 
of the 100 values (у, Y2, - - - , Уо) that you observed. How does the average of the 100 
values of y;, i = 1,2,..., 100 compare to E(Y), the expected number of spots on a single 
toss of a balanced die? (Notice that the mean and standard deviation of Y that you computed 
in Exercise 3.22 are given on the second line of the “Stat Report" pop-up screen.) 

e How does the standard deviation of the 100 values of y;, i = 1, 2,..., 100 compare to the 
standard deviation of Y given on the second line of the “Stat Report" pop-up screen? 

f Click the button “Roll 1000 Sets" a few times, observing changes to the histogram as 
you generate more and more realized values of the sample mean. How does the resulting 
histogram compare to the graph given in Figure 7.1(a)? 
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7.4 


Refer to Example 7.1 and Exercise 7.1. 


a Use the method of Example 7.1 to find the exact value of P(Y = 2). 

b Refer to the histogram obtained in Exercise 7.1(d). How does the relative frequency with 
which you observed Y = 2 compare to your answer to part (a)? 

с If you were to generate 10,000 values of Y, what do you expect to obtain for the relative 
frequency of observing Y — 2? 


Applet Exercise Refer to Exercise 7.1. Use the applet DiceSample and scroll down to the 
next part of the screen that corresponds to taking samples of size n — 12 from the population 
corresponding to tossing a balanced die. 


a Take a single sample of size n = 12 by clicking the button “Roll One Set.” Use the button 
“Roll One Set" to generate nine more values of the sample mean. How does the histogram of 
observed values of the sample mean compare to the histogram observed in Exercise 7.1(c) 
that was based on ten samples each of size 3? 

b Use the button “Roll 10 Sets" nine more times until you have obtained and plotted 100 
realized values (each based on a sample of size п = 12) for the sample mean Y. Click 
on the button “Show Stats” to see the mean and standard deviation of the 100 values 
(у, Уз,...› Улоо) that you observed. 


i How does the average of these 100 values of y;, i = 1,2,..., 100 compare to the 
average of the 100 values (based on samples of size п = 3) that you obtained in 
Exercise 7.1(d)? 

ii Divide the standard deviation of the 100 values of y;, i = 1, 2,..., 100 based on 
samples of size 12 that you just obtained by the standard deviation of the 100 values 
(based on samples of size n — 3) that you obtained in Exercise 7.1. Why do you expect 
to get a value close to 1/2? [Hint: V(Y) = о?/п.] 


с Click on the button “Toggle Normal.” The (green) continuous density function plotted over 
the histogram is that of a normal random variable with mean and standard deviation equal 
to the mean and standard deviation of the 100 values, (¥,, Y2, ..., Уо), plotted on the 
histogram. Does this normal distribution appear to reasonably approximate the distribution 
described by the histogram? 


Applet Exercise The population corresponding to the upper face observed on a single toss of a 
balanced die is such that all six possible values are equally likely. Would the results analogous 
to those obtained in Exercises 7.1 and 7.2 be observed if the die was not balanced? Access the 
applet DiceSample and scroll down to the part of the screen dealing with “Loaded Die." 


a Ifthe die is loaded, the six possible outcomes are not equally likely. What are the probabil- 
ities associated with each outcome? Click on the buttons “1 roll," “10 rolls,” and/or “1000 
rolls" until you have a good idea of the probabilities associated with the values 1, 2, 3, 4, 
5, and 6. What is the general shape of the histogram that you obtained? 

b Click the button “Show Stats" to see the true values of the probabilities of the six possible 
values. If Y is the random variable denoting the number of spots on the uppermost face, 
what is the value for и = E(Y)? What is the value of c, the standard deviation of Y ? [Hint: 
These values appear on the "Stat Report" screen.] 

c How many times did you simulate rolling the die in part (a)? How do the mean and standard 
deviation of the values that you simulated compare to the true values и = E(Y) and o? 
Simulate 2000 more rolls and answer the same question. 


d Scroll down to the portion of the screen labeled “Rolling 3 Loaded Dice.” Click the button 
“Roll 1000 Sets" until you have generated 3000 observed values for the random variable У. 
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і What is the general shape of the simulated sampling distribution that you obtained? 


ii How does the mean of the 3000 values y,. Уз, ..., Узор compare to the value of и = 
E(Y) computed in part (a)? How does the standard deviation of the 3000 values compare 


to c/4/3? 
Scroll down to the portion of the screen labeled “Rolling 12 Loaded Dice.” 


i In part (ii), you will use the applet to generate 3000 samples of size 12, compute the 
mean of each observed sample, and plot these means on a histogram. Before using the 
applet, predict the approximate value that you will obtain for the mean and standard 
deviation of the 3000 values of y that you are about to generate. 

ii Use the applet to generate 3000 samples of size 12 and obtain the histogram associated 
with the respective sample means, y;, = 1,2,...,3000. What is the general shape 
of the simulated sampling distribution that you obtained? Compare the shape of this 
simulated sampling distribution with the one you obtained in part (d). 


iii Click the button *Show Stats" to observe the mean and standard deviation of the 3000 


values y1, Y2, ..., Узооо- How do these values compare to those you predicted in part 


(i)? 


Applet Exercise What does the sampling distribution of the sample mean look like if samples 
are taken from an approximately normal distribution? Use the applet Sampling Distribution 
of the Mean (at www.thomsonedu.com/statistics/wackerly) to complete the following. The 
population to be sampled is approximately normally distributed with u = 16.50 and o. = 6.03 
(these values are given above the population histogram and denoted M and S, respectively). 


a 


Use the button “Next Obs” to select a single value from the approximately normal popu- 
lation. Click the button four more times to complete a sample of size 5. What value did 
you obtain for the mean of this sample? Locate this value on the bottom histogram (the 
histogram for the values of У). 

Click the button "Reset" to clear the middle graph. Click the button *Next Obs" five more 
times to obtain another sample of size 5 from the population. What value did you obtain for 
the mean of this new sample? Is the value that you obtained equal to the value you obtained 
in part (a)? Why or why not? 

Use the button *1 Sample" eight more times to obtain a total of ten values of the sample 
mean. Look at the histogram of these ten means. 


і What do you observe? 


ii How does the mean of these 10 y-values compare to the population mean и? 


Use the button *1 Sample" until you have obtained and plotted 25 realized values for the 
sample mean Y, each based on a sample of size 5. 


і What do you observe about the shape of the histogram of the 25 values of y;, i = 1, 
2; 524291 

ii How does the value of the standard deviation of the 25 y values compare with the 
theoretical value for oy obtained in Example 5.27 where we showed that, if Y is 
computed based on a sample of size n, then V (Y) = o?/n? 


Click the button “1000 Samples" a few times, observing changes to the histogram as you 
generate more and more realized values of the sample mean. What do you observe about 
the shape of the resulting histogram for the simulated sampling distribution of Y? 

Click the button “Toggle Normal" to overlay (in green) the normal distribution with 
the same mean and standard deviation as the set of values of Y that you previously 
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generated. Does this normal distribution appear to be a good approximation to the sampling 
distribution of Y? 


Applet Exercise What is the effect of the sample size on the sampling distribution of Y? 
Use the applet SampleSize to complete the following. As in Exercise 7.5, the population to be 
sampled is approximately normally distributed with и = 16.50 and с = 6.03 (these values 
are given above the population histogram and denoted M and S, respectively). 


a 


Use the up/down arrows in the left “Sample Size" box to select one of the small sample 
sizes that are available and the arrows in the right "Sample Size" box to select a larger 
sample size. 

Click the button *1 Sample" a few times. What is similar about the two histograms that 
you generated? What is different about them? 

Click the button *1000 Samples" a few times and answer the questions in part (b). 

Are the means and standard deviations of the two sampling distributions close to the values 
that you expected? [Hint: Ү(Ү)= о? /п.] 

Click the button “Toggle Normal.” What do you observe about the adequacy of the арргох- 
imating normal distributions? 


Applet Exercise What does the sampling distribution of the sample variance look like if we 
sample from a population with an approximately normal distribution? Find out using the applet 
Sampling Distribution of the Variance (Mound Shaped Population) (at www.thomsonedu.com/ 
statistics/wackerly) to complete the following. 


a 


Click the button “Next Obs" to take a sample of size 1 from the population with distribution 
represented by the top histogram. The value obtained is plotted on the middle histogram. 
Click four more times to complete a sample of size 5. The value of the sample variance is 
computed and given above the middle histogram. Is the value of the sample variance equal 
to the value of the population variance? Does this surprise you? 


When you completed part (a), the value of the sample variance was also plotted on the 
lowest histogram. Click the button “Reset” and repeat the process in part (a) to generate 
a second observed value for the sample variance. Did you obtain the same value as you 
observed in part (a)? Why or why not? 

Click the button *1 Sample" a few times. You will observe that different samples lead 
to different values of the sample variance. Click the button “1000 Samples" a few times 
to quickly generate a histogram of the observed values of the sample variance (based on 
samples of size 5). What is the mean of the values of the sample variance that you generated? 
Is this mean close to the value of the population variance? 

In the previous exercises in this section, you obtained simulated sampling distributions for 
the sample mean. АП these sampling distributions were well approximated (for large sample 
sizes) by a normal distribution. Although the distribution that you obtained is mound- 
shaped, does the sampling distribution of the sample variance seem to be symmetric (like 
the normal distribution)? 

Click the button “Toggle Theory” to overlay the theoretical density function for the sampling 
distribution of the variance of a sample of size 5 from a normally distributed population. 
Does the theoretical density provide a reasonable approximation to the values represented 
in the histogram? 

Theorem 7.3, in the next section, states that if a random sample of size n is taken from 
a normally distributed population, then (n — 1)S?/o7 has a x? distribution with (n — 1) 
degrees of freedom. Does this result seem consistent with what you observed in parts (d) 
and (e)? 
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Applet Exercise What is the effect of the sample size on the sampling distribution of 52? 
Use the applet VarianceSize to complete the following. As in some previous exercises, the 
population to be sampled is approximately normally distributed with u = 16.50 and о = 6.03. 


a What is the value of the population variance o°? 


b Use the up/down arrows in the left “Sample Size" box to select one of the small sample 
sizes that are available and the arrows in the right "Sample Size" box to select a larger 
sample size. 


i Click the button “1 Sample" a few times. What is similar about the two histograms that 
you generated? What is different about them? 
ii Click the button “1000 Samples" a few times and answer the questions in part (i). 
iii Are the means of the two sampling distributions close to the value of the population 
variance? Which of the two sampling distributions exhibits smaller variability? 


iv Click the button “Toggle Theory.” What do you observe about the adequacy of the 
approximating theoretical distributions? 


с Select sample sizes of 10 and 50 for a new simulation and click the button “1000 Samples” 
a few times 


і Which of the sampling distributions appear to be more similar to a normal distribution? 
ii Refer to Exercise 7.7(f). In Exercise 7.97, you will show that, for a large number of 


degrees of freedom, the x? distribution can be approximated by a normal distribution. 
Does this seem reasonable based on your current simulation? 


Sampling Distributions Related 
to the Normal Distribution 


We have already noted that many phenomena observed in the real world have rela- 
tive frequency distributions that can be modeled adequately by a normal probability 
distribution. Thus, in many applied problems, it is reasonable to assume that the ob- 
servable random variables in a random sample, Y;, Y2,..., Y,, are independent with 
the same normal density function. In Exercise 6.43, you established that the statistic 
Ү = (1 /п)( + Y2 +- - -+ Y) actually has a normal distribution. Because this result 
is used so often in our subsequent discussions, we present it formally in the following 
theorem. 


Let Yı, Yo, ..., Y, be a random sample of size n from a normal distribution 
with mean и and variance o”. Then 
= = 
В 
2 


is normally distributed with mean иу = ш and variance су = o? [n. 
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Proof 


Because Yj, Y2,..., Y, is a random sample from a normal distribution with 
mean jz and variance Gs Want = Ul, De ona sis independent, normally dis- 
tributed variables, with E(Y;) = и and V(Y;) = o?. Further, 


— 1 1 1 1 
Va ЧОУ Т, 
П 1 п п п 


= aj Yi t+ a5Yo +: -F asYs, wines @ = m.s = My Zo ov anit 


Thus, Y is a linear combination of Y;, Yo,..., Y,, and Theorem 6.3 can be 
applied to conclude that Y is normally distributed with 


v 1 1 1 1 
Е(Ү) = АЕ) = (и) + ::: + (ш) = ш 
п п п п 
апа 


= 1 1 DN ПА 
n n n n 


= — (no?) E 
n 


That is, the sampling distribution of Y is normal with mean wy = и and variance 


puc ey) 
0, = 0 /п. 


Notice that the variance of each of the random variables Y,, Y5,..., Y, is o? and 
the variance of the sampling distribution of the random variable Y is o?/n. In the 
discussions that follow, we will have occasion to refer to both of these variances. The 
notation o? will be retained for the variance of the random variables Y;, Yo, ... , Yn 
and оў will be used to denote the variance of the sampling distribution of the random 
variable Y. Analogously, o will be retained as the notation for the standard deviation 
of the Y;’s, and the standard deviation of the sampling distribution of Y is denoted оу. 

Under the conditions of Theorem 7.1, Y is normally distributed with mean by = Ш 
and variance ор = o? [n. It follows that 


Y-u; Y- Y- 
Oy a//n с 


has a standard normal distribution. We will illustrate the use of Theorem 7.1 in the 
following example. 


EXAMPLE 7.2 


A bottling machine can be regulated so that it discharges an average of u ounces per 
bottle. It has been observed that the amount of fill dispensed by the machine is normally 
distributed with с = 1.0 ounce. A sample of n = 9 filled bottles is randomly selected 
from the output of the machine on a given day (all bottled with the same machine 
setting), and the ounces of fill are measured for each. Find the probability that the 
sample mean will be within .3 ounce ofthe true mean jz for the chosen machine setting. 


Solution 
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If Yj, Y2,..., Yo denote the ounces of fill to be observed, then we know that the 
Y;'s are normally distributed with mean џи and variance о? = | fori = 1, 2,...,9. 
Therefore, by Theorem 7.1, Y possesses a normal sampling distribution with mean 
uy = и and variance ор = о?/п = 1/9. We want to find 


P(Y — ш < .3) = Р[—.3 < (У — ин) < .3] 


3 Y-u КЫ 
a/J/n~ ofyn ~ ofyn 
Because (Y — Ly) /oF = (Y — ш) / (c / /п) has a standard normal distribution, it fol- 
lows that 


Using Table 4, Appendix 3, we find 
P(—.9 < Z < .9) = 1 —2P(Z > 9) = 1— 2(.1841) = .6318. 


Thus, the probability is only .6318 that the sample mean will be within .3 ounce of 
the true population mean. B 


EXAMPLE 7.3 


Solution 


Refer to Example 7.2. How many observations should be included in the sample if 
we wish Y to be within .3 ounce of u with probability .95? 


Now we want 
P(\Y — p| < .3) = Р[—.3 < (Y — ш) < 3] = 95. 


Dividing each term of the inequality by оу = o/./n (recall that o = 1), we have 


< 
о//п ~ \a/Jn 
But using Table 4, Appendix 3, we obtain 
P(—1.96 < Z x 1.96) = .95. 


‚| = (D) |е an < = .3\/п) = 95. 


It must follow that 
| 1.96? 
3./n = 1.96 or, equivalently, n = = 42.68. 


From a practical perspective, it is impossible to take a sample of size 42.68. Our 
solution indicates that a sample of size 42 is not quite large enough to reach our 
objective. If n = 43, P(|Y — u| < .3) slightly exceeds .95. E 
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THEOREM 7.2 


Proof 


FIGURE 7.2 

A x? distribution 
showing upper-tail 
area o 


In succeeding chapters we will be interested in statistics that are functions of 
the squares of the observations in a random sample from а normal population. 
Theorem 7.2 establishes the sampling distribution of the sum of the squares of inde- 
pendent, standard normal random variables. 


Let Y;, Y2,..., Y, be defined as in Theorem 7.1. Then Z; = (Y; — 4)/o are 
independent, standard normal random variables, i = 1, 2, ..., n, and 
= 0 
72 = 
=, 


has a x? distribution with п degrees of freedom (df). 


Because Yj, Y2,..., Y, is a random sample from a normal distribution with 
mean ш and variance o?, Example 6.10 implies that Z; = (Y; — и)/с has a 
standard normal distribution for і = 1, 2,..., n. Further, the random variables 
Z; are independent because the random variables Y;'s are independent, i = 1, 
Doss oo M Tire it iei ys e Ze has a x? distribution with n df follows directly 
from Theorem 6.4. 


From Table 6, Appendix 3, we can find values x2 so that 
P(x? > хг) = 
for random variables with x? distributions (see Figure 7.2). For example, if the x? 
random variable of interest has 10 df, Table 6, Appendix 3, can be used to find Хо: 
To do so, look in the row labeled 10 df and the column headed X55 and read the value 
4.86518. Therefore, if Y has a x? distribution with 10 df, P(Y > 4.86518) = 
It follows that P(Y < 4.86518) = .10 and that 4.86518 is the .10 quantile, ф по, of a 
x? random variable with 10 df. In general, 
Р(х? > 0) = и implies that P(x? < xi) =l-a 
and that x2 = фи, the (1 — о) quantile of the x? random variable. 

Table 6, Appendix 3, contains »x- = фу—„ for ten values of o (.005, .01, .025, .05, 
.1, .90, .95, .975, .99 and .995) for each of 37 different х? distributions (those with 
degrees of freedom 1, 2,..., 30 and 40, 50, 60, 70, 80, 90 and 100). Considerably 
more information about these distributions, and those associated with degrees of 


Хи) 
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freedom not covered in the table, is provided by available statistical software. If Y 
has a x? distribution with v df, the R (and S-Plus) command pchisq (yo, v) gives 
P(Y < yo) whereas qchisq (р, v) yields the pth quantile, the value $, such that 
P(Y < p) = p. Probabilities and quantiles associated with X? random variables 
are also easily obtained using the Chi-Square Probabilities and Quantiles applet 
(accessible at www.thomsonedu.com/statistics/wackerly). 

The following example illustrates the combined use of Theorem 7.2 and the x? 
tables. 


EXAMPLE 7.4 


Solution 


If Zi, Z2,..., Ze denotes a random sample from the standard normal distribution, 
find a number b such that 


By Theorem 7.2, Уе 22 has a x? distribution with 6 df. Looking at Table 6, 
Appendix 3, in the row headed 6 df and the column headed ace we see the number 
12.5916. Thus, 


6 6 
Р (x z? > 125916) =.05, or, equivalently, P (x иы 123015 = .95, 


i=l i=l 


and b = 12.5916 is the .95 quantile (95th percentile) of the sum of the squares of six 
independent standard normal random variables. L1 


THEOREM 7.3 


The x? distribution plays an important role in many inferential procedures. For 
example, suppose that we wish to make an inference about the population variance 
о? based on a random sample Yi, Y2,..., Y, from a normal population. As we will 
show in Chapter 8, a good estimator of o? is the sample variance 


The following theorem gives the probability distribution for a function of the statis- 
iio S^. 


Let Yi, Y2,..., Y, be a random sample from a normal distribution with mean 
u and variance o?. Then 


(n= 1)S? zr 1 ze ES 


has a x? distribution with (n — 1) df. Also, Y and S? are independent random 
variables. 
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Proof 


The complete proof of this theorem is outlined in Exercise 13.93. To make the 
general result more plausible, we will consider the case п = 2 and show that 
(n — DS o” has a x? distribution with 1 df. In the case n = 2, 


Y = (1/21 + Y), 


and, therefore, 


hx = 
—— > ry 


2 


1 2 1 2 
E = 2 + Zl 4r |n = 2 + ZI 


== ly Y. i ly. Y 
= |5 i= »| +|;%- D| 


Е. (и - Y 
=2[ 5-22. 


2 


It follows that, when n = 2, 


=, 


o? 20? 2g? 


We will show that this quantity is equal to the square of a standard normal 
random variable; that is, itis a Z?, which—as we have already shown in Example 
6.11—possesses a x? distribution with 1 df. 

Because Y; — Y is a linear combination of independent, normally distributed 
random variables (Y; — Y2 = aj Y; + а Y? with a; = 1 and a? = —1), Theorem 
6.3 tells us that Y; — Y? has a normal distribution with mean 1и — lu = 0 and 
variance (1)?o? + (—1)?a? = 202. Therefore, 


M = 5 


20? 


Ж ке 


has a standard normal distribution. Because for n = 2 


== 


g? 202 


it follows that (n — 1) S? if c? has a x? distribution with 1 df. 
In Example 6.13, we proved that U; = (Y; + Y2)/o and U5 = (Y; — Y2)/o 
are independent random variables. Notice that, because п = 2, 


vom ges mI РЕ оа 


2 p p 2 


Because Y is a function of only U, and 52 is a function of only U3, the inde- 
pendence of U; and U> implies the independence of Y and S?. 
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EXAMPLE 7.5 


Solution 


In Example 7.2, the ounces of fill from the bottling machine are assumed to have a 
normal distribution with o? = 1. Suppose that we plan to select a random sample 
of ten bottles and measure the amount of fill in each bottle. If these ten observations 
are used to calculate 52, it might be useful to specify an interval of values that will 
include S? with a high probability. Find numbers b; and b such that 


P(b, < S? € bj) = .90. 


Notice that 


2 
Pb < $ <) = [2 Dh UD. е pe). 


о? E о? о? 


Because o? = 1, it follows that (n — 1) 52/а? = (п — 1)? has a x? distribution with 
(n — 1) df. Therefore, we can use Table 6, Appendix 3, to find two numbers a, and 
аз such that 


Pla, < (n — DS? < a] = .90. 


One method of doing this is to find the value of а» that cuts off an area of .05 in 
the upper tail and the value of a, that cuts off .05 in the lower tail (.95 in the upper 
tail). Because there are n — 1 — 9 df, Table 6, Appendix 3, gives a? — 16.919 and 
d, = 3.325. Consequently, values for bı and b» that satisfy our requirements are 
given by 


E. 3.325 
ce eee cs ea = 9b, of b= — 1369 and 
oO 9 
—1)Ь 16.919 
т о =. эү у ig, 


Thus, if we wish to have an interval that will include S? with probability .90, one such 
interval is (.369, 1.880). Notice that this interval is fairly wide. mi 


The result given in Theorem 7.1 provides the basis for development of inference- 
making procedures about the mean jz of a normal population with known variance 
o?. In that case, Theorem 7.1 tells us that J/n(Y — u)/o has a standard normal 
distribution. When c is unknown, it can be estimated by S = %/52, and the quantity 


A 


provides the basis for developing methods for inferences about u. We will show that 
A/n(Y — 12)/S has a distribution known as Student's t distribution with n — 1 df. The 
general definition of a random variable that possesses a Student's ¢ distribution (or 
simply a t distribution) is as follows. 
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DEFINITION 7.2 


FIGURE 7.3 

А comparison of the 
standard normal and 
t density functions. 


Let Z be a standard normal random variable and let W be a x?-distributed 
variable with v df. Then, if Z and W are independent, 


is said to have a t distribution with v df. 


If Y;, Yo, ..., Y, constitute a random sample from a normal population with mean 
ш and variance o?, Theorem 7.1 may be applied to show Z = y/n (Y — 2)/o has a 
standard normal distribution. Theorem 7.3 tells us that W = (n — 1)S? / c? has a x? 
distribution with v = n — 1 df and that Z and W are independent (because Y and 52 
are independent). Therefore, by Definition 7.2, 


pa Z ~- ма - W/o -va (2#) 
MUZE Ve - 1)2/0?]/(n — 1) 5 
has a t distribution with (n — 1) df. 

The equation for the ¢ density function will not be given here, but it can be found 
in Exercise 7.98 where hints about its derivation are given. Like the standard normal 
density function, the ¢ density function is symmetric about zero. Further, for v > 1, 
E(T) = О; and for v > 2, V(T) = v/(v — 2). These results follow directly from 
results developed in Exercises 4.111 and 4.112 (see Exercise 7.30). Thus, we see that, 
if v > 1, a t-distributed random variable has the same expected value as a standard 
normal random variable. However, a standard normal random variable always has 
variance | whereas, if v > 2, the variance of a random variable with а ¢ distribution 
always exceeds 1. 

A standard normal density function and a ¢ density function are sketched in 
Figure 7.3. Notice that both density functions are symmetric about the origin but 
that the ¢ density has more probability mass in its tails. 

Values of t, such that P(T > ty) = а are given in Table 5, Appendix 3. For 
example, if a random variable has a t distribution with 21 df, t 100 is found by looking 
in the row labeled 21 df and the column headed f уо. Using Table 5, we see that 
tioo = 1.323 and that for 21 df, P(T > 1.323) = .100. It follows that 1.323 is the 
.90 quantile (the 90th percentile) of the ¢ distribution with 21 df and in general that 
ty = $i s, the (1 — о) quantile [the 100(1 — в) percentile] of a t-distributed random 
variable. 


= Standard 
Normal 
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Table 5, Appendix 3, contains fy = фу—„ for five values of a (.005, .010, .025, 
.050 and .100) and 30 different ¢ distributions (those with degrees of freedom 1, 
2,...,29 and oo). Considerably more information about these distributions, and 
those associated with degrees of freedom not covered in the table, is provided by 
available statistical software. If Y has a t distribution with v df, the R (and S- 
Plus) command pt (уо, у) gives P(Y < yo) whereas qt (p,v) yields the pth 
quantile, the value of $, such that P(Y < $,) = p. Probabilities and quantiles 
associated with ft-distributed random variables are also easily obtained using the 
Student's t Probabilitles and Quantiles applet (at www.thomsonedu.com/statistics/ 
wackerly). 


EXAMPLE 7.6 


Solution 


The tensile strength for a type of wire is normally distributed with unknown mean 
ш and unknown variance o?. Six pieces of wire were randomly selected from a 
large roll; Y;, the tensile strength for portion 7, is measured for i = 1, 2,...,6. 
The population mean jz and variance o? can be estimated by Y and S?, respectively. 
Because Ge = о? /п, it follows that 02. can be estimated by S? /n. Find the approximate 


probability that Y will be within 25/./n of the true population mean p. 


We want to find 


28 dy < 25 РИ Ee 
dl uz ю< 55] ‚| 2< va ; EJ 


= P(-2<T <2), 


where T has а t distribution with, in this case, п — 1 = 5 df. Looking at Table 5, 
Appendix 3, we see that the upper-tail area to the right of 2.015 is .05. Hence, 


P(—2.015 < T x 2.015) = .90, 


and the probability that Y will be within 2 estimated standard deviations of ju is 
slightly less than .90. In Exercise 7.24, the exact value for P(-2 < T < 2) 
will be found using the Student’s t Probabilities and Quantiles applet available at 
www.thomsonedu.com/statistics/wackerly. 

Notice that, if о? were known, the probability that Y will fall within 2oy of ш 
would be given by 


"EG «eG n БЕ 


= P(-2 < Z < 2) = .9544. L| 


Suppose that we want to compare the variances of two normal populations based 


on information contained in independent random samples from the two populations. 


Samples of sizes n, and n» are taken from the two populations with variances o? 
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DEFINITION 7.3 


and 92, respectively. If we calculate Sr from the observations in sample 1, then S? 
estimates Gr: Similarly, 52, calculated from the observations in the second sample, 
estimates оў. Thus, it seems intuitive that the ratio 52 / B: could be used to make 
inferences about the relative magnitudes of o; and 82. If we divide each EE by HP. 


then the resulting ratio 
Stlop _ (5) (S) 
55/02  Naj/ NS; 


has an F distribution with (n; — 1) numerator degrees of freedom and (n2 — 1) 
denominator degrees of freedom. The general definition of a random variable that 
possesses an F distribution appears next. 


Let W, апа И» be independent x?-distributed random variables with v; and v2 
df, respectively. Then 


Wi/v 
Fe 1/01 
М/о 
is said to have an F distribution with v; numerator degrees of freedom and v2 
denominator degrees of freedom. 


The density function for an F’-distributed random variable is given in Exercise 7.99 
where the method for its derivation is outlined. It can be shown (see Exercise 7.34) 
that if F possesses an F distribution with v; numerator and v2 denominator de- 
grees of freedom, then E(F) = v5/(v; — 2) if v; > 2. Also, if v) > 4, then 
V(F) = [2>7(®\ + v, — 2)|/[Dvi(v — 2)? (v; — 4)]. Notice that the mean of an F- 
distributed random variable depends only on the number of denominator degrees of 
freedom, v». 

Considering once again two independent random samples from normal distribu- 
tions, we know that W; = (nı — 1)S?/o? and № = (по — 1)S3/o3 have independent 
х? distributions with v; = (n; — 1) and v = (n2 — 1) df, respectively. Thus, Defini- 
tion 7.3 implies that 


u Wi/v 
Wa fv» 


[Qm — DS2/o2]/(i — 1) _ S?/o? 
[0 — DS/o2]/m» —0 583/02 


has an F distribution with (nı — 1) numerator degrees of freedom and (n2 — 1) 
denominator degrees of freedom. 

A typical F density function is sketched in Figure 7.4. Values of F, such that 
P(F > F,) = a are given in Table 7, Appendix 3, for values of a = .100, .050, 
.025, .010, and .005. In Table 7, the column headings are the numerator degrees 
of freedom whereas the denominator degrees of freedom are given in the main-row 
headings. Opposite each denominator degrees of freedom (row heading), the values of 
a = .100, .050, .025, 010, and .005 appear. For example, if the F variable of interest 
has 5 numerator degrees of freedom and 7 denominator degrees of freedom, then 
F100 = 2.88, Foso = 3.97, F ops = 5.29, Foio = 7.46, and Foos = 9.52. Thus, if F 
hasan F distribution with 5 numerator degrees of freedom and 7 denominator degrees 


FIGURE 7.4 
A typical F 
probability 

density function 
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of freedom, then P(F > 7.46) = .01. It follows that 7.46 is the .99 quantile of the 
F distribution with 5 numerator degrees of freedom and 7 denominator degrees of 
freedom. In general, Fy = фо, the (1 — o) quantile [the 100(1 — o)th percentile] 
of an F-distributed random variable. 

For the five previously mentioned values of a, Table 7, Appendix 3 gives the values 
of Fy for 646 different F distributions (those with numerator degrees of freedom 1, 
2,..., 10, 12, 15, 20, 24, 30, 40, 60, 120, and oo, and denominator degrees of free- 
dom 1, 2, ..., 30, 40, 60, 120, and co). Considerably more information about these 
distributions, and those associated with degrees of freedom not covered in the table, 
is provided by available statistical software. If Y has an F distribution with v; numer- 
ator degrees of freedom and vz denominator degrees of freedom, the R (and S-Plus) 
command pf (yo, v1, v2) gives P(Y < yo) whereas qf (p, vı, V2) yields the pth 
quantile, the value of $, such that P(Y < $,) = p. Probabilities and quantiles asso- 
ciated with F-distributed random variables are also easily obtained using the F-Ratio 
Probabilitles and Quantiles applet (at www.thomsonedu.com/statistics/wackerly). 


EXAMPLE 7.7 


Solution 


If we take independent samples of size n; = 6 and n2 = 10 from two normal pop- 
ulations with equal population variances, find the number b such that 


52 
Р (3 < ») LL 
52 


Because n, = 6, n; = 10, and the population variances are equal, then 


2 p39 2 
$1/01 u Sr 
2,27 «0 
55/05 55 
has ап F distribution with vj = n; — 1 = 5 numerator degrees of freedom and 


V? = m — 1 = 9 denominator degrees of freedom. Also, 


s? 52 
p (3 s0)=1- P(A >). 
55 55 


Therefore, we want to find ће number № cutting off an upper-tail area of .05 under the 
F density function with 5 numerator degrees of freedom and 9 denominator degrees 
of freedom. Looking in column 5 and row 9 in Table 7, Appendix 3, we see that the 
appropriate value of b is 3.48. 
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Even when the population variances are equal, the probability that the ratio of 
the sample variances exceeds 3.48 is still .05 (assuming sample sizes of n, = 6 and 
пә = 10). oO 


7.9 


7.10 


7.11 


This section has been devoted to developing the sampling distributions of various 
statistics calculated by using the observations in a random sample from a normal pop- 
ulation (or independent random samples from two normal populations). In particular, 
if Yi, Yo,..., Y, represents a random sample from a normal population with mean и 
and variance o?, we have seen that /n(Y — ш) /о has a standard normal distribution. 
Also, (n — 1) S? / c? has a x? distribution, and n (Y — и) /S has a t distribution (both 
with n — 1 df). If we have two independent random samples from normal popula- 
tions with variances аў апа оў, Шеп F = (82/05) / (52 /o2) has an F distribution. 
These sampling distributions will enable us to evaluate the properties of inferential 
procedures in later chapters. In the next section, we discuss approximations to certain 
sampling distributions. These approximations can be very useful when the exact form 
of the sampling distribution is unknown or when it is difficult or tedious to use the 
exact sampling distribution to compute probabilities. 


Exercises 


Referto Example 7.2. The amount of fill dispensed by a bottling machine is normally distributed 
with o = 1 ounce. If п = 9 bottles are randomly selected from the output of the machine, 
we found that the probability that the sample mean will be within .3 ounce of the true mean is 
.6318. Suppose that Y is to be computed using a sample of size n. 


a Ifn = 16, what is P(]Y — u| < .3)? 

b Find P(|Y — u| < .3) when Y is to be computed using samples of sizes п = 25, n = 36, 
n — 49, and n — 64. 

c What pattern do you observe among the values for P(|Y — u| < .3) that you observed for 
the various values of n? 


d Do the results that you obtained in part (b) seem to be consistent with the result obtained 
in Example 7.3? 


Refer to Exercise 7.9. Assume now that the amount of fill dispensed by the bottling machine 
is normally distributed with c = 2 ounces. 


a Ifn = 9 bottles are randomly selected from the output of the machine, what is P (|Y — u| < 
.3)? Compare this with the answer obtained in Example 7.2. 

b Find P(Y = u| < .3) when Y is to be computed using samples of sizes n = 25, n = 36, 
n = 49, and n = 64. 

с What pattern do you observe among the values for P (|Y — u| < .3) that you observed for 
the various values of n? 

d How do the respective probabilities obtained in this problem (where с = 2) compare to 
those obtained in Exercise 7.9 (where o = 1)? 


A forester studying the effects of fertilization on certain pine forests in the Southeast is int- 
erested in estimating the average basal area of pine trees. In studying basal areas of similar trees 
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for many years, he has discovered that these measurements (in square inches) are normally 
distributed with standard deviation approximately 4 square inches. If the forester samples 
n = 9 trees, find the probability that the sample mean will be within 2 square inches of the 
population mean. 


Suppose the forester in Exercise 7.11 would like the sample mean to be within 1 square inch 
of the population mean, with probability .90. How many trees must he measure in order to 
ensure this degree of accuracy? 


The Environmental Protection Agency is concerned with the problem of setting criteria for the 
amounts of certain toxic chemicals to be allowed in freshwater lakes and rivers. A common 
measure of toxicity for any pollutant is the concentration of the pollutant that will kill half of 
the test species in a given amount of time (usually 96 hours for fish species). This measure is 
called LC50 (lethal concentration killing 50% of the test species). In many studies, the values 
contained in the natural logarithm of LC50 measurements are normally distributed, and, hence, 
the analysis is based on In(LC50) data. 

Studies of the effects of copper on a certain species of fish (say, species A) show the variance 
of In(LC50) measurements to be around .4 with concentration measurements in milligrams per 
liter. If n — 10 studies on LC50 for copper are to be completed, find the probability that the 
sample mean of In(LC50) will differ from the true population mean by no more than .5. 


If in Exercise 7.13 we want the sample mean to differ from the population mean by no more 
than .5 with probability .95, how many tests should be run? 


Suppose that Х|, X5,..., Xm and Y;, Y?,..., Y, are independent random samples, with the 
variables X; normally distributed with mean jz; and variance о? and the variables Y; normally 
distributed with mean и» and variance 0j. The difference between the sample means, X=F, 
is then a linear combination of m + n normally distributed random variables and, by Theorem 
6.3, is itself normally distributed. 


a Find E(X — Y). 

b Find V(X — Y). 

с Suppose that o? = 2, 02 = 2.5, and m = n. Find the sample sizes so that (X — Y) will be 
within 1 unit of (uı — u2) with probability .95. 


Referring to Exercise 7.13, suppose that the effects of copper on a second species (say, species 
B) of fish show the variance of In(LC50) measurements to be .8. If the population means of 
In(LC50) for the two species are equal, find the probability that, with random samples of ten 
measurements from each species, the sample mean for species A exceeds the sample mean for 
species B by at least 1 unit. 


Applet Exercise Referto Example 7.4. Use the applet Chi-Square Probabilities and Quantiles 
to find P ( X ME 6). (Recall that 37? Z? has a x? distribution with 6 аг.) 


Applet Exercise Refer to Example 7.5. If o? = 1 and n = 10, use the applet Chi-Square 
Probabilities and Quantiles to find P(S? > 3). (Recall that, under the conditions previously 
given, 95? has a х? distribution with 9 df.) 


Ammeters produced by a manufacturer are marketed under the specification that the standard 
deviation of gauge readings is no larger than .2 amp. One of these ammeters was used to make ten 
independent readings on a test circuit with constant current. If the sample variance of these ten 
measurements is .065 and it is reasonable to assume that the readings are normally distributed, 
do the results suggest that the ammeter used does not meet the marketing specifications? [Hint: 
Find the approximate probability that the sample variance will exceed .065 if the true population 
variance is .04.] 


366  Chapter7 | Sampling Distributions and the Central Limit Theorem 


7.20 


7.21 


7.22 


7.23 


7.24 


7.25 


a IfU has a x? distribution with v df, find E(U) and V (U). 


b Using the results of Theorem 7.3, find E(S?) and V(S?) when Y;, Y>,..., Y, is a random 
8 


sample from a normal distribution with mean ju and variance o°. 


Refer to Exercise 7.13. Suppose that n = 20 observations are to be taken on In(LC50) mea- 
surements and that o? = 1.4. Let 52 denote the sample variance of the 20 measurements. 


a Find a number b such that P(S? < р) = .975. 
b Find a number a such that P(a < S?) = .975. 
с Ifa and b are as in parts (a) and (b), what is P (a < S? <b)? 


Applet Exercise As we stated in Definition 4.10, a random variable Y has a x? distribution 
with v df if and only if Y has а gamma distribution with о = v/2 and В = 2. 


a Use the applet Comparison of Gamma Density Functions to graph x? densities with 10, 
40, and 80 df. 

b What do you notice about the shapes of these density functions? Which of them is most 
symmetric? 

c In Exercise 7.97, you will show that for large values of v, a x? random variable has a 
distribution that can be approximated by a normal distribution with u = v апас = J2v. 
How do the mean and standard deviation of the approximating normal distribution compare 
to the mean and standard deviation of the x? random variable Y? 


d Refer to the graphs of the x? densities that you obtained in part (a). In part (c), we stated 
that, if the number of degrees of freedom is large, the x? distribution can be approximated 
with a normal distribution. Does this surprise you? Why? 


Applet Exercise 


a Use the applet Chi-Square Probabilities and Quantiles to find P[Y > E(Y)] when Y has 
X? distributions with 10, 40, and 80 df. 


b What did you notice about P[Y > E(Y)] as the number of degrees of freedom increases 
as in part (a)? 

c How does what you observed in part (b) relate to the shapes of the x? densities that you 
obtained in Exercise 7.22? 


Applet Exercise Refer to Example 7.6. Suppose that Т has a г distribution with 5 df. 


a Use the applet Student's t Probabilities and Quantiles to find the exact probability that T 
is greater than 2. 


b Use the applet Student's t Probabilities and Quantiles to find the exact probability that T 
is less than —2. 

с Use the applet Student's t Probabilities and Quantiles to find the exact probability that T 
is between —2 and 2. 


d Your answer to part (c) is considerably less than 0.9544 — P(—2 < Z «x 2). Refer to 
Figure 7.3 and explain why this is as expected. 


Applet Exercise Suppose that Т is a t-distributed random variable. 


а IfT has 5 df, use Table 5, Appendix 3, to find г о, the value such that P(T > t10) = .10. 
Find f19 using the applet Student's t Probabilities and Quantiles. 


Refer to part (a). What quantile does / уу correspond to? Which percentile? 


Use the applet Student's t Probabilities and Quantiles to find the value of t уо for t distri- 
butions with 30, 60, and 120 df. 
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d When Z has a standard normal distribution, P(Z > 1.282) = .10 and z 49 = 1.282. What 
property of the г distribution (when compared to the standard normal distribution) explains 
the fact that all of the values obtained in part (c) are larger than z 0 = 1.282? 

e What do you observe about the relative sizes of the values of fo for t distributions with 
30, 60, and 120 df? Guess what г о “converges to" as the number of degrees of freedom 
gets large. [Hint: Look at the row labeled co in Table 5, Appendix 3.] 


Refer to Exercise 7.11. Suppose that in the forest fertilization problem the population standard 
deviation of basal areas is not known and must be estimated from the sample. If a random 
sample of n = 9 basal areas is to be measured, find two statistics g; and gz such that P[g; < 
(Y — u) < g] = .90. 


Applet Exercise Refer to Example 7.7. If we take independent samples of sizes n; — 6 and 
n5 — 10 from two normal populations with equal population variances, use the applet F-Ratio 
Probabilities and Quantiles to find 


a Р(52/52 > 2). 
b Р(52/52 < 0.5). 
с the probability that one of the sample variances is at least twice as big as the other. 


Applet Exercise Suppose that Y has an F distribution with v; = 4 numerator degrees of 
freedom and v; = 6 denominator degrees of freedom. 


а Use Table 7, Appendix 3, to find Fs. Also find Fo25 using the applet F-Ratio Probabilities 
and Quantiles. 


b Refer to part (a). What quantile of Y does F5 correspond to? What percentile? 


Refer to parts (а) and (b). Use the applet F-Ratio Probabilities and Quantiles to find F 975, 
the .025 quantile (2.5th percentile) of the distribution of У. 

d If U has an F distribution with v; = 6 numerator and v; = 4 denominator degrees of 
freedom, use Table 7, Appendix 3, or the F-Ratio Probabilities and Quantiles applet to 
find Fs. 


e In Exercise 7.29, you will show that if Y is a random variable that has an F distribution 
with v, numerator and v; denominator degrees of freedom, then U = 1/Y has an F 
distribution with v; numerator and v; denominator degrees of freedom. Does this result 
explain the relationship between F 975 from part (c) (4 numerator and 6 denominator degrees 
of freedom) and Fo25 from part (d) (6 numerator and 4 denominator degrees of freedom)? 
What is this relationship? 


If Y is a random variable that has an F distribution with v; numerator and v» denominator 
degrees of freedom, show that U = 1/Y has an F distribution with v; numerator and 1, 
denominator degrees of freedom. 


Suppose that Z has a standard normal distribution and that Y is an independent x?-distributed 
random variable with v df. Then, according to Definition 7.2, 


7 
VY/v 


T= 


has at distribution with v df.! 


a If Z has a standard normal distribution, give E(Z) and E(Z?). [Hint: For any random 
variable, E (Z?) = V (Z) + (E(Z)y.] 


1. Exercises preceded by an asterisk are optional. 
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b According to the result derived in Exercise 4.112(a), if Y has a x? distribution with v df, 
then 


Е (Ү) = Г/Л 2, ifv > —2a. 
Г (0/2) 
Use this result, the result from part (а), and the structure of Т to show the following. 
[Hint: Recall the independence of Z and Y.] 
i E(T)-—0,ifv» 1. 
ii V(T)—v/(v—2)ifv > 2. 


a Use Table 7, Appendix 3, to find Fo; for F-distributed random variables, all with 4 numer- 
ator degrees of freedom, but with denominator degrees of freedom of 10, 15, 30, 60, 120, 
and oo. 

b Referto part (а). What do you observe about the values of F o; as the number of denominator 
degrees of freedom increases? 

c What is x2, for a x°-distributed random variable with 4 df? 

d Divide the value of Xi (4 df) from part (c) by the value of Fo; (numerator df = 4; 
denominator df — oo). Explain why the value that you obtained is a reasonable value 


for the ratio. [Hint: Consider the definition of an F-distributed random variable given in 
Definition 7.3.] 


Applet Exercise 


a Find fos for a t-distributed random variable with 5 df. 
Refer to part (a). What is P(T? > t35)? 
Find Fo for an F-distributed random variable with 1 numerator degree of freedom and 5 
denominator degrees of freedom. 
Compare the value of F 1o found in part (c) with the value of i: from parts (a) and (b). 
In Exercise 7.33, you will show that if T has a t distribution with v df, then U — T? has an 
F distribution with 1 numerator degree of freedom and v denominator degrees of freedom. 
How does this explain the relationship between the values of F 1o (1 num. df, 5 denom df) 
and unt (5 df) that you observed in part (d)? 


Use the structures of T and F given in Definitions 7.2 and 7.3, respectively, to argue that if Т 
has a t distribution with v df, then U = T? has an F distribution with 1 numerator degree of 
freedom and v denominator degrees of freedom. 


Suppose that W, and W, are independent x?-distributed random variables with v, and v» df, 
respectively. According to Definition 7.3, 
_ WM 


Е = 
М/о 


has ап F distribution with v; and v» numerator and denominator degrees of freedom, re- 
spectively. Use the preceding structure of F, the independence of W, and W3, and the result 
summarized in Exercise 7.30(b) to show 


a E(F) = v/v — 2), if v > 2. 

b V(F) = [2>7( + v — 2]/Dni(v) — 2? (v; — 4)], if v; > 4. 

Refer to Exercise 7.34. Suppose that F has an F distribution with v, = 50 numerator degrees 
of freedom and v; — 70 denominator degrees of freedom. Notice that Table 7, Appendix 3, 


does not contain entries for 50 numerator degrees of freedom and 70 denominator degrees of 
freedom. 
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a Whatis E(F)? 
b Give V(F). 
с Isit likely that F will exceed 3? [Hint: Use Tchebysheff’s theorem. ] 


Let 52 denote the sample variance for a random sample of ten In(LC50) values for copper and 
let 52 denote the sample variance for a random sample of eight In(LC50) values for lead, both 
samples using the same species of fish. The population variance for measurements on copper is 
assumed to be twice the corresponding population variance for measurements on lead. Assume 
S? to be independent of 55. 


a Find a number b such that 


b Find a number a such that 


2 
P ( < 5) =95, 
55 


[Hint: Use the result of Exercise 7.29 and notice that P(U;/U5 < k) = P(U2/U; > 1/k).] 


с Ifa andb are as in parts (a) and (b), find 


S2 
р(а= <b). 
5 5 5 


Let Vi, Y5,..., Ys be a random sample of size 5 from a normal population with mean 0 and 
variance | and let Y = (1 /5) x ; Yi. Let Ys be another independent observation from the 
same population. What is the distribution of 


а W=} ү27 Why? 
b U=% (ү, – Y? Why? 
с EL, (У - Y + YR? Why? 


Suppose that Yi, №, ..., Ys, Ys, Y, W, and U are as defined in Exercise 7.37. What is the 
distribution of 


a J5¥6//W? Why? 
b 2Y;,/4/U? Why? 
c 2 er + i) /U? Why? 


Suppose that independent samples (of sizes n;) are taken from each of k populations and that 
population i is normally distributed with mean и; and variance o?,i = 1,2,...,k. That is, 
all populations are normally distributed with the same variance but with (possibly) different 
means. Let X; and S?, i = 1,2,...,k be the respective sample means and variances. Let 
0 = сш + сомо +--+ + exu, Where c1, Co, ..., Cg are given constants. 


a Give the distribution of Ó = c ‚Ху ++ Co Xo +--+ 4. с Ху. Provide reasons for any claims 
that you make. 


b Give the distribution of 


SSE k А 
==, here SSE = у ү = DS. 
= where (n )S; 


2 
ial 


Provide reasons for any claims that you make. 
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FIGURE 7.5 f(y) 


An exponential 
density function 


с Give the distribution of 
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Provide reasons for any claims that you make. 


The Central Limit Theorem 


In Chapter 5, we showed that if Y1, Y2,..., Y, represents a random sample from апу 
distribution with mean и and variance о>, then E(Y) = и and V (Y) = o? /n. In this 
section, we will develop an approximation for the sampling distribution of Y that can 
be used regardless of the distribution of the population from which the sample is taken. 

If we sample from a normal population, Theorem 7.1 tells us that Y has a normal 
sampling distribution. But what can we say about the sampling distribution of Y if 
the variables Y; are not normally distributed? Fortunately, Y will have a sampling 
distribution that is approximately normal if the sample size is large. The formal 
statement of this result is called the central limit theorem. Before we state this theorem, 
however, we will look at some empirical investigations that demonstrate the sampling 
distribution of Y. 

A computer was used to generate random samples of size n from an exponential 
density function with mean 10—that is, from a population with density 


jJ a/19e779, у> 0, 


1 elsewhere. 


f) 


A graph of this density function is given in Figure 7.5. The sample mean was computed 
for each sample, and the relative frequency histogram for the values of the sample 
means for 1000 samples each of size n — 5, is shown in Figure 7.6. Notice that 
Figure 7.6 portrays a histogram that is roughly mound-shaped, but the histogram is 
slightly skewed. 

Figure 7.7 is a graph of a similar relative frequency histogram of the values of the 
sample mean for 1000 samples, each of size n — 25. In this case, Figure 7.7 shows a 
mounded-shaped and nearly symmetric histogram, which can be approximated quite 
closely with a normal density function. 


FIGURE 7.6 
Relative frequency 
histogram: sample 

means for 1000 

samples (n = 5) from 
an exponential 
distribution 


FIGURE 7.7 
Relative frequency 
histogram: sample 

means for 1000 

samples (n = 25) 
from an exponential 
distribution 
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Recall from Chapter 5 that E(Y) = hy = ш and Ү(Ү) = о2 = o? [n. For 


Y 
the exponential density function used in the simulations, u = E(Y;) = 10 and 
о? = Ү(Ү;) = (10)? = 100. Thus, for this example, we see that 

o? 100 


Шу = Е(Ү) = п = 10 and of = VY) = — = —. 
п п 


For each value of п (5 and 25), we calculated the average of the 1000 sample means 
generated in the study. The observed variance of the 1000 sample means was also 
calculated for each value of n. The results are shown in Table 7.1. In each empirical 
study (n = 5 andn = 25), the average of the observed sample means and the variance 
of the observed sample means are quite close to the theoretical values. 

We now give a formal statement of the central limit theorem. 
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Table 7.1 Calculations for 1000 sample means 


Sample Average of 1000 Variance of 1000 

Size Sample Means py = ш Sample Means a; = o? fn 
n=5 9.86 10 19.63 20 
п = 25 9.95 10 3.93 4 


THEOREM 7.4 Central Limit Theorem: Let Y;, Y2,..., Y, be independent and identically 


distributed random variables with E(Y;) = и and V(Y;) = o? « oo. Define 
рл Y- 2 ole 
л 2с T 2 where У = — pr Ver 
суп a//n nés 


Then the distribution function of U, converges to the standard normal distribu- 
tion function as п — oo. That is, 


U, = 


© il 2 
lim P(U, < и) =| E dt for all u. 
n—oo —oo y Л 


The central limit theorem implies that probability statements about U, can be арргох- 
imated by corresponding probabilities for the standard normal random variable if и 
is large. (Usually, a value of n greater than 30 will ensure that the distribution of (7, 


can be closely approximated by a normal distribution.) 


As a matter of convenience, the conclusion of the central limit theorem is often 
replaced with the simpler statement that Y is asymptotically normally distributed with 
mean u and variance o?/n. The central limit theorem can be applied to a random 
sample Yi, Y2,..., Y, from any distribution as long as E(Y;) = и and V(Y;) = о? 


аге both finite and the sample size is large. 


We will give some examples of the use of the central limit theorem but will defer 
the proof until the next section (coverage of which is optional). The proof is not needed 
for an understanding of the applications of the central limit theorem that appear in 


this text. 


EXAMPLE 7.8 


the probability that the sample mean is at most 58 when n = 100.) 


Solution Let Y denote the mean of a random sample of n = 100 scores from a population with 
u = 60 and o? = 64. We want to approximate P(Y < 58). We know from Theorem 


7.4 that (Y — )/(o/,/n) has a distribution that can be approximated by a standard 
normal distribution. Hence, using Table 4, Appendix 3, we have 


Y-60 _ 58-60 
8/5100 ~ 8 


Р(Ү < 58) = Р | ) ~ P(Z < —2.5) = .0062. 


Achievement test scores of all high school seniors in а state have mean 60 and variance 
64. A random sample of n = 100 students from one large high school had a mean 
score of 58. Is there evidence to suggest that this high school is inferior? (Calculate 
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Because this probability is so small, it is unlikely that the sample from the school 
of interest can be regarded as a random sample from a population with u = 60 and 
о? = 64. The evidence suggests that the average score for this high school is lower 
than the overall average of u = 60. 

This example illustrates the use of probability in the process of testing hypothe- 
ses, a common technique of statistical inference that will be further discussed in 
Chapter 10. E 


EXAMPLE 7.9 The service times for customers coming through a checkout counter in a retail store are 
independent random variables with mean 1.5 minutes and variance 1.0. Approximate 
the probability that 100 customers can be served in less than 2 hours of total service 
time. 

Solution If we let Y; denote the service time for the ith customer, then we want 
100 
— 120 = 
P Y; < 120) = P| Y < —|-—P(Y < 1.20). 
(л < 100) =r (7< 5) -rF «im 
Because the sample size is large, the central limit theorem tells us that Y is approx- 
imately normally distributed with mean иу = и = 1.5 and variance от = о?/п = 
1.0/100. Therefore, using Table 4, Appendix 3, we have 
= Y-1.50  1.20—1.50 
P(Y < 1.20) = Р < 
1/4/100 1/4/100 
x= P[Z < (1.2 — 1.5)10] = P(Z < —3) = .0013. 

Thus, the probability that 100 customers can be served in less than 2 hours is 
approximately .0013. This small probability indicates that it is virtually impossible 
to serve 100 customers in only 2 hours. L1 
Exercises 

7.40 Applet Exercise Suppose that the population of interest does not have a normal distribution. 


What does the sampling distribution of Y look like, and what is the effect of the sample size on 
the sampling distribution of Y? Use the applet SampleSize to complete the following. Use the 
up/down arrow to the left of the histogram of the population distribution to select the "Skewed" 
distribution. What is the mean and standard deviation of the population from which samples 
will be selected? [These values are labeled M and S, respectively, and are given above the 
population histogram.] 


а Use the up/down arrows in the left and right “Sample Size” boxes to select samples of size 
1 and 3. Click the button *1 Sample" a few times. What is similar about the two histograms 
that you generated? What is different about them? 
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b Click the button “1000 Samples” a few times and answer the questions in part (b). Do the 
generated histograms have the shapes that you expected? Why? 

с Are the means and standard deviations of the two sampling distributions close to the values 
that you expected? [Hint: V (Y) = o?/n.] 

d Click the button “Toggle Normal.” What do you observe about the adequacy of the approx- 
imating normal distributions? 

e Click on the two generated sampling distributions to pop up windows for each. Use the 
up/down arrows in the left and right “Sample Size” boxes to select samples of size 10 and 
25. Click the button “Toggle Normal.” You now have graphs of the sampling distributions 
of the sample means based on samples of size 1, 3, 10, and 25. What do you observe about 
the adequacy of the normal approximation as the sample size increases? 


Applet Exercise Refer to Exercise 7.40. Use the applet SampleSize to complete the following. 
Use the up/down arrow to the left of the histogram of the population distribution to select the 
“U-shaped” distribution. What is the mean and standard deviation of the population from which 
samples will be selected? 


a Answer the questions in parts (a) through (e) of Exercise 7.40. 


b Refer to part (a). When you examined the sampling distribution of Y forn = 3, the sampling 
distribution had a “valley” in the middle. Why did this occur? Use the applet Basic to find 
out. Select the “U-shaped” population distribution and click the button “1 Sample.” What 
do you observe about the values of individual observations in the sample. Click the button 
“1 Sample” several more times. Do the values in the sample tend to be either (relatively) 
large or small with few values in the “middle”? Why? What effect does this have on the 
value of the sample mean? [Hint: 3 is an odd sample size.] 


The fracture strength of tempered glass averages 14 (measured in thousands of pounds per 
square inch) and has standard deviation 2. 


a What is the probability that the average fracture strength of 100 randomly selected pieces 
of this glass exceeds 14.5? 


b Find an interval that includes, with probability 0.95, the average fracture strength of 100 
randomly selected pieces of this glass. 


An anthropologist wishes to estimate the average height of men for a certain race of people. If 
the population standard deviation is assumed to be 2.5 inches and if she randomly samples 100 
men, find the probability that the difference between the sample mean and the true population 
mean will not exceed .5 inch. 


Suppose that the anthropologist of Exercise 7.43 wants the difference between the sample mean 
and the population mean to be less than .4 inch, with probability .95. How many men should 
she sample to achieve this objective? 


Workers employed in a large service industry have an average wage of $7.00 per hour with 
a standard deviation of $.50. The industry has 64 workers of a certain ethnic group. These 
workers have an average wage of $6.90 per hour. Is it reasonable to assume that the wage rate 
of the ethnic group is equivalent to that of a random sample of workers from those employed 
in the service industry? [Hint: Calculate the probability of obtaining a sample mean less than 
or equal to $6.90 per hour.] 


The acidity of soils is measured by a quantity called the pH, which may range from 0 (high 
acidity) to 14 (high alkalinity). A soil scientist wants to estimate the average pH for a large 
field by randomly selecting п core samples and measuring the pH in each sample. Although 
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the population standard deviation of pH measurements is not known, past experience indicates 
that most soils have a pH value of between 5 and 8. If the scientist selects n = 40 samples, find 
the approximate probability that the sample mean of the 40 pH measurements will be within 
.2 unit of the true average pH for the field. [Hint: See Exercise 1.17.] 


Suppose that the scientist of Exercise 7.46 would like the sample mean to be within .1 of the 
true mean with probability .90. How many core samples should the scientist take? 


An important aspect of a federal economic plan was that consumers would save a substantial 
portion of the money that they received from an income tax reduction. Suppose that early 
estimates of the portion of total tax saved, based on a random sampling of 35 economists, had 
mean 26% and standard deviation 12%. 


a What is the approximate probability that a sample mean estimate, based on a random 
sample of n = 35 economists, will lie within 1% of the mean of the population of the 
estimates of all economists? 

b Isit necessarily true that the mean of the population of estimates of all economists is equal 
to the percent tax saving that will actually be achieved? 


The length of time required for the periodic maintenance of an automobile or another machine 
usually has a mound-shaped probability distribution. Because some occasional long service 
times will occur, the distribution tends to be skewed to the right. Suppose that the length of time 
required to run a 5000-mile check and to service an automobile has mean 1.4 hours and standard 
deviation .7 hour. Suppose also that the service department plans to service 50 automobiles per 
8-hour day and that, in order to do so, it can spend a maximum average service time of only 1.6 
hours per automobile. On what proportion of all workdays will the service department have to 
work overtime? 


Shear strength measurements for spot welds have been found to have standard deviation 10 
pounds per square inch (psi). If 100 test welds are to be measured, what is the approximate 
probability that the sample mean will be within 1 psi of the true population mean? 


Refer to Exercise 7.50. If the standard deviation of shear strength measurements for spot welds 
is 10 psi, how many test welds should be sampled if we want the sample mean to be within 
1 psi of the true mean with probability approximately .99? 


Resistors to be used in a circuit have average resistance 200 ohms and standard deviation 
10 ohms. Suppose 25 of these resistors are randomly selected to be used in a circuit. 


a What is the probability that the average resistance for the 25 resistors is between 199 and 
202 ohms? 

b Findthe probability that the total resistance does not exceed 5100 ohms. [Hint: see Example 
7.9.] 


One-hour carbon monoxide concentrations in air samples from a large city average 12 ppm 
(parts per million) with standard deviation 9 ppm. 


a Doyouthinkthat carbon monoxide concentrations in air samples from this city are normally 
distributed? Why or why not? 

b Find the probability that the average concentration in 100 randomly selected samples will 
exceed 14 ppm. 


Unaltered bitumens, as commonly found in lead-zinc deposits, have atomic hydrogen/carbon 
(H/C) ratios that average 1.4 with standard deviation .05. Find the probability that the average 
H/C ratio is less than 1.3 if we randomly select 25 bitumen samples. 
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The downtime per day for a computing facility has mean 4 hours and standard deviation .8 hour. 


a Suppose that we want to compute probabilities about the average daily downtime for a 
period of 30 days. 


i What assumptions must be true to use the result of Theorem 7.4 to obtain a valid 
approximation for probabilities about the average daily downtime? 

ii Under the assumptions described in part (i), what is the approximate probability that 
the average daily downtime for a period of 30 days is between 1 and 5 hours? 


b Under the assumptions described in part (a), what is the approximate probability that the 
total downtime for a period of 30 days is less than 115 hours? 


Many bulk products—such as iron ore, coal, and raw sugar—are sampled for quality by a 
method that requires many small samples to be taken periodically as the material is moving 
along a conveyor belt. The small samples are then combined and mixed to form one composite 
sample. Let Y; denote the volume of the ith small sample from a particular lot and suppose that 
Yi, Y2,..., Y, constitute a random sample, with each Y; value having mean џи (in cubic inches) 
and variance o?. The average volume и of the samples can be set by adjusting the size of the 
sampling device. Suppose that the variance o? of the volumes of the samples is known to be 
approximately 4. The total volume of the composite sample must exceed 200 cubic inches with 
probability approximately .95 when n — 50 small samples are selected. Determine a setting 
for ш that will allow the sampling requirements to be satisfied. 


Twenty-five heat lamps are connected in a greenhouse so that when one lamp fails, another takes 
over immediately. (Only one lamp is turned on at any time.) The lamps operate independently, 
and each has a mean life of 50 hours and standard deviation of 4 hours. If the greenhouse is 
not checked for 1300 hours after the lamp system is turned on, what is the probability that a 
lamp will be burning at the end of the 1300-hour period? 


Suppose that X1, X»,..., X, апа Y, Y2,..., Y, are independent random samples from pop- 
ulations with means и and и and variances с? and o3, respectively. Show that the random 
variable 

_ (X - Y) - (ш = ш) 


U, 
V (oi + o2)/n 


satisfies the conditions of Theorem 7.4 and thus that the distribution function of U, converges 
to a standard normal distribution function as n — oo. [Hint: Consider W; — X; — Y;, for 
ПЕ 2:7] 


An experiment is designed to test whether operator A or operator B gets the job of operating 
a new machine. Each operator is timed on 50 independent trials involving the performance 
of a certain task using the machine. If the sample means for the 50 trials differ by more than 
1 second, the operator with the smaller mean time gets the job. Otherwise, the experiment is 
considered to end in a tie. If the standard deviations of times for both operators are assumed to 
be 2 seconds, what is the probability that operator A will get the job even though both operators 
have equal ability? 


The result in Exercise 7.58 holds even if the sample sizes differ. That is, if X1, X5, ..., Xn, 
and Yı, Y2,..., У, constitute independent random samples from populations with means ш 
and и» and variances o7 and o2, respectively, then X — Y will be approximately normally 
distributed, for large n; and n», with mean ш — и» and variance (og/ni) + (o2 /n»). 

The flow of water through soil depends on, among other things, the porosity (volume 
proportion of voids) of the soil. To compare two types of sandy soil, n; = 50 measurements 
are to be taken on the porosity of soil A and n; — 100 measurements are to be taken on soil B. 
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Proof 
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Assume that о? = .01 and of = .02. Find the probability that the difference between the 
sample means will be within .05 unit of the difference between the population means ш — u2. 


Refer to Exercise 7.60. Suppose that n, = n» = n, and find the value of п that allows the 
difference between the sample means to be within .04 unit of шу — и» with probability .90. 


The times that a cashier spends processing individual customer’s order are independent random 
variables with mean 2.5 minutes and standard deviation 2 minutes. What is the approximate 
probability that it will take more than 4 hours to process the orders of 100 people? 


Refer to Exercise 7.62. Find the number of customers n such that the probability that the orders 
of all n customers can be processed in less than 2 hours is approximately .1. 


A Proof of the Central Limit 
Theorem (Optional) 


We will sketch a proof of the central limit theorem for the case in which the moment- 
generating functions exist for the random variables in the sample. The proof depends 
upon a fundamental result of probability theory, which cannot be proved here but that 
is stated in Theorem 7.5. 


Let Y and Y;, Y2, Y3,... be random variables with moment-generating func- 
tions m(t) and тт (7), m2(t), ma(t), ..., respectively. If 


lim m,(t) = m(t) for all real f, 
n—oo 


then the distribution function of Y,, converges to the distribution function of Y 
asn — oo. 


We now give the proof of the central limit theorem, Theorem 7.4. 


n n 
== І (zeit. | 2045 em = 
vn о = 

Because the random variables Y;’s are independent and identically distributed, 
Zi,i = 1, 2,..., n, are independent, and identically distributed with E(Z;) = 
0 and V(Z;) = 1. 

Since the moment-generating function of the sum of independent random 
variables is the product of their individual moment-generating functions, 


my z,(t) = mz, (t) x mz,(t) x --- x mz, (t) = [mz (0] 
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and 


senes (а) G9] 


By Taylor's theorem, with remainder (see your Calculus IT text) 
2 
mz, (t) = mz, (0) + mz, (0) + mz, (8), where 0 < £ < t, 


and because mz, (0) = E(e°%!) = E(1) = 1, and ms, (9) = EZD 2 (0, 
2o 


edens hm , where0 « & < t. 


Therefore, 


n 23 
MEE 


m 00/2 t 
= E xz] ^ where 0 < &, < —=. 


Jn 


Notice that as п — оо, £y — 0 and т” G = т” (Фуд = = 
EZ 2402/2 = == 2 because EZ )=V(Z) = a Recall that if 


bn á 
lim b, =b then lim (1 + 2) = e. 
поо поо п 
Finally, 


т" (E)t2/2)" „ 
lim my,(t) = lim [ 4 ү 
поо п->со п 
the moment-generating function for a standard normal random variable. Apply- 
ing Theorem 7.5, we conclude that U, has a distribution function that converges 


to the distribution function of the standard normal random variable. 


7.5 The Normal Approximation 
to the Binomial Distribution 


The central limit theorem also can be used to approximate probabilities for some 
discrete random variables when the exact probabilities are tedious to calculate. One 
useful example involves the binomial distribution for large values of the number of 
trials n. 

Suppose that Y has a binomial distribution with п trials and probability of success 
on any one trial denoted by p. If we want to find P(Y < b), we can use the binomial 
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probability function to compute P(Y = y) for each nonnegative integer y less than 
or equal to b and then sum these probabilities. Tables are available for some values 
of the sample size п, but direct calculation is cumbersome for large values of n for 
which tables may be unavailable. 

Alternatively, we can view Y, the number of successes in п trials, as a sum of a 
sample consisting of Os and 1s; that is, 


y= 3. 
i=l 


where 
| 1, ifthe ith trial results in success, 
di 0, otherwise. 

The random variables X; fori = 1, 2,..., are independent (because the trials are 
independent), and it is easy to show that E(X;) = p and V(X;) = p(1 — p) for 
i = 1, 2,..., n. Consequently, when и is large, the sample fraction of successes, 

Y 1 n __ 

"ту, 

п п 


possesses an approximately normal sampling distribution with mean E(X;) = p and 
variance V(X;)/n — p(1— p)/n. 

Thus, we have used Theorem 7.4 (the central limit theorem) to establish that if Y 
is a binomial random variable with parameters n and p and if n is large, then Y /n 
has approximately the same distribution as U, where U is normally distributed with 
mean uy = p and variance ор = p(l — p)/n. Equivalently, for large n, we can 
think of Y as having approximately the same distribution as W, where W is normally 
distributed with mean шу = np and variance оў = np(l-— p). 


EXAMPLE 7.10 


Solution 


Candidate A believes that she can win a city election if she can earn at least 5596 of 
the votes in precinct 1. She also believes that about 50% of the city's voters favor her. 
If n — 100 voters show up to vote at precinct 1, what is the probability that candidate 
A will receive at least 5596 of their votes? 


Let Y denote the number of voters at precinct 1 who vote for candidate A. We must 
approximate P(Y/n > .55) when p is the probability that a randomly selected voter 
from precinct 1 favors candidate A. If we think of the п = 100 voters at precinct 1 as 
a random sample from the city, then Y has a binomial distribution with п = 100 and 
p — .5. We have seen that the fraction of voters who favor candidate A is 


where X; — 1 if the ith voter favors candidate A and X; — 0 otherwise. 
Because it is reasonable to assume that X;,i = 1, 2,..., n are independent, the 
central limit theorem implies that X = Y/n is approximately normally distributed 
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with mean p = .5 and variance pq /n = (.5)(.5)/100 = .0025. Therefore, 


Y Y/n-.5 .55—. 
p (=> 55) = P( ES? ыз Te PZ > 1) = 1587 
n A/.0025 .05 


from Table 4, Appendix 3. EH 


FIGURE 7.8 

The normal 
approximation to the 
binomial distribution: 
п= 10 апа p=.5 


The normal approximation to binomial probabilities works well even for moder- 
ately large n as long as p is not close to zero or one. A useful rule of thumb is that the 
normal approximation to the binomial distribution is appropriate when p + 3./pq/n 
lies in the interval (0, 1)—that is, if 


0< p—3ypq/n and p+3V/pq/n <1. 


In Exercise 7.70, you will show that a more convenient but equivalent criterion is 
that the normal approximation is adequate if 


larger of p and q 
n>9 ; 
smaller of p and q 


As you will see in Exercise 7.71, for some values of p, this criterion is sometimes met 
for moderate values of n. Especially for moderate values of n, substantial improvement 
in the approximation can be made by a slight adjustment on the boundaries used 
in the calculations. If we look at the segment of a binomial distribution graphed in 
Figure 7.8, we can see what happens when we try to approximate a discrete distribution 
represented by a histogram with a continuous density function. 

If we want to find P(Y < 3) by using the binomial distribution, we can find the total 
area in the four rectangles (above 0, 1, 2, and 3) illustrated in the binomial histogram 
(Figure 7.8). Notice that the total area in the rectangles can be approximated by an 
area under the normal curve. The area under the curve includes some areas not in the 
histogram and excludes the portion of the histogram that lies above the curve. If we 
want to approximate P(Y < 3) by calculating an area under the density function, 
the area under the density function to the left of 3.5 provides a better approximation 
than does the area to the left of 3.0. The following example illustrates how close the 
normal approximation is for a case in which some exact binomial probabilities can 
be found. 


pO) 
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EXAMPLE 7.11 


Solution 


FIGURE 7.9 

P(Y = 8) for 
binomial distribution 
of Example 7.11 


Suppose that Y has a binomial distribution with n — 25 and p — .4. Find the exact 
probabilities that Y < 8 and Y = 8 and compare these to the corresponding values 
found by using the normal approximation. 


From Table 1, Appendix 3, we find that 
P(Y < 8) = 274 
and 
P(Y = 8) = P(Y < 8) — P(Y <7) = 277A — .154 = .120. 


As previously stated, we can think of Y as having approximately the same distri- 
bution as W, where W is normally distributed with uw = np and ae = пр(1— p). 
Because we want P(Y < 8), we look at the normal curve area to the left of 8.5. Thus, 

W —np 2 8.5 — 10 | 
Japi р) 250006) 
= P(Z < —.61) = .2709 


PY «8e PW =85) = P| 


from Table 4, Appendix 3. This approximate value is close to the exact value for 
P(Y < 8) = .274, obtained from the binomial tables. 

To find the normal approximation to the binomial probability p(8), we will find 
the area under the normal curve between the points 7.5 and 8.5 because this is the 
interval included in the histogram bar over y = 8 (see Figure 7.9). 

Because Y has approximately the same distribution as W, where W is normally 
distributed with ww = np = 25(.4) = 10 and o7, = np(1 — р) = 25(.4)(.6) = 6, it 
follows that 


P(Y =8) ~ P(7.5 < W < 8.5) 
= 10 W-10 _ 8.5 – 2) 
=P < < 
v6 V6 v6 
= P(-1.02 < Z x —.61) = .2709 — .1539 = .1170. 


p) 
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Again, we see that this approximate value is very close to the actual value, 
P(Y = 8) = .120, calculated earlier. Б 
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In the above example, we used an area under a normal curve to approximate 
P(Y < 8) and P(Y = 8) when Y had a binomial distribution with n = 25 and 
p = 4. To improve the approximation, .5 was added to the largest value of interest 
(8) when we used the approximation P(Y < 8) ~ P(W < 8.5) апа W had an 
appropriate normal distribution. Had we been interested in approximating P(Y > 6), 
we would have used P(Y > 6) © P(W > 5.5); that is, we would have subtracted .5 
from the smallest value of interest (6). The .5 that we added to the largest value of 
interest (making it a little larger) and subtracted from the smallest value of interest 
(making it a little smaller) is commonly called the continuity correction associated 
with the normal approximation. The only time that this continuity correction is used 
in this text is when we approximate a binomial (discrete) distribution with a normal 
(continuous) distribution. 


Exercises 


Applet Exercise Access the applet Normal Approximation to Binomial Distribution (at www. 
thomsonedu.com/statistics/wackerly). When the applet is started, it displays the details in 
Example 7.11 and Figure 7.9. Initially, the display contains only the binomial histogram and 
the exact value (calculated using the binomial probability function) for p(8) — P(Y — 8). 
Scroll down a little and click the button "Toggle Normal Approximation" to overlay the normal 
density with mean 10 and standard deviation J/.6 — 2.449, the same mean and standard 
deviation as the binomial random variable Y. You will get a graph superior to the one in 
Figure 7.9. 


a How many probability mass or density functions are displayed? 


b Enter 0 in the box labeled “Begin” and press the enter key. What probabilities do you 
obtain? 


с Refer to part (b). On the line where the approximating normal probability is displayed, you 
see the expression 


Normal: P(—0.5 <= k <= 8.5) = 0.2701. 


Why are the .5s in this expression? 
Applet Exercise Suppose that Y has a binomial distribution with n — 5 and p — .10. 


a Use the Normal Approximation to Binomial Distribution applet to find exact and approxi- 
mate values for P(Y < 1). 


b The normal approximation is not particularly good. Why? 


Applet Exercise Refer to Exercise 7.65. In that case, P(Y < 1) = P(|Y — E(Y)| < 1). 
If p = .10, use the applet Normal Approximation to Binomial Distribution to search for the 
smallest n so that the exact value and the normal approximation for P(|Y — E(Y)| < 1) differ 
by less than .01. 
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Applet Exercise Suppose that Y has a binomial distribution with p = .20. 


a 


Use the applet Normal Approximation to Binomial Distribution to compute the exact and 
approximate values of P(Y < u + 3) for n = 5, 10, 15, and 20. For each sample size, pay 
attention to the shapes of the binomial histograms and to how close the approximations are 
to the exact binomial probabilities. 

Refer to part (a). What did you notice about the shapes of the binomial histograms as the 
sample size increased? What did you notice about the differences between the exact and 
approximate values of P(Y < и + 3) as the sample size increased? 

According to the rule of thumb for the adequacy of the normal approximation, how large 
must п be for the approximation to be adequate? Is this consistent with what you observed 
in parts (a) and (b)? 


Applet Exercise In 2004 Florida was hit by four major hurricanes. In 2005 a survey indicated 
that, in 2004, 48% of the households in Florida had no plans for escaping an approaching 
hurricane. Suppose that a recent random sample of 50 households was selected in Gainesville 
and that those in 29 of the households indicated that their household had a hurricane escape 
plan. 


a 


If the 2004 state percentages still apply to recent Gainesville households, use the Normal 
Approximation to Binomial Distribution applet to find the exact and approximate values of 
the probability that 29 or more of the households sampled have a hurricane escape plan. 
Refer to part (a). Is the normal approximation close to the exact binomial probability? 
Explain why. 


Refer to Exercise 7.68. 


a 


b 


Based on your answer to Exercise 7.68(a), do you think that the 2004 Florida percentages 
still apply to recent Gainesville households? 

Let Y bethe number of Gainesville households that have a hurricane escape plan in a sample 
of size 50. Use the applet Normal Approximation to Binomial Distribution to determine 
the value of b so that P(Y > b) is small enough to allow you to conclude that the 2004 
Florida percentages do not apply to recent Gainesville households. 


In this section, we provided the rule of thumb that the normal approximation to the binomial 
distribution is adequate if p + 3y pq /n lies in the interval (0, 1)—that is, if 


a 


0< p—3ypq/n and p+3/pq/n < 1. 


Show that 


p+3/pq/n < 1 ifandonlyif л > 9(p/q). 
Show that 


0 < p—3/pq/n_ ifandonlyif л > 9(4/р). 


Combine the results from parts (a) and (b) to obtain that the normal approximation to the 


binomial 1S adequate if 
n> (2) and n> (4). 
а Р 


larger of p and q 
n > 9| — —— |, 
smaller of p and q 


or, equivalently, 
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Refer to Exercise 7.70. 


a For what values of п will the normal approximation to the binomial distribution be adequate 
if p = .5? 
b Answer the question in part (a) if р = .6, .4, .8, .2, .99, and .001. 


A machine is shut down for repairs if a random sample of 100 items selected from the daily 
output of the machine reveals at least 15% defectives. (Assume that the daily output is a large 
number of items.) If on a given day the machine is producing only 10% defective items, what 
is the probability that it will be shut down? [Hint: Use the .5 continuity correction.] 


An airline finds that 5% of the persons who make reservations on a certain flight do not show 
up for the flight. If the airline sells 160 tickets for a flight with only 155 seats, what is the 
probability that a seat will be available for every person holding a reservation and planning 
to fly? 


According to a survey conducted by the American Bar Association, | in every 410 Americans 
is a lawyer, but 1 in every 64 residents of Washington, D.C., is a lawyer. 


a Ifyou select a random sample of 1500 Americans, what is the approximate probability that 
the sample contains at least one lawyer? 

b If the sample is selected from among the residents of Washington, D.C., what is the ap- 
proximate probability that the sample contains more than 30 lawyers? 

с If you stand on a Washington, D.C., street corner and interview the first 1000 persons who 
walked by and 30 say that they are lawyers, does this suggest that the density of lawyers 
passing the corner exceeds the density within the city? Explain. 


A pollster believes that 20% of the voters in a certain area favor a bond issue. If 64 voters are 
randomly sampled from the large number of voters in this area, approximate the probability 
that the sampled fraction of voters favoring the bond issue will not differ from the true fraction 
by more than .06. 


a Show that the variance of Y/n, where Y has a binomial distribution with п trials and a 
success probability of p, has a maximum at р = .5, for fixed n. 

b A random sample of п items is to be selected from a large lot, and the number of defectives 
Y is to be observed. What value of n guarantees that Y/n will be within .1 of the true 
fraction of defectives, with probability .95? 


The manager of a supermarket wants to obtain information about the proportion of customers 
who dislike a new policy on cashing checks. How many customers should he sample if he 
wants the sample fraction to be within .15 of the true fraction, with probability .98? 


If the supermarket manager (Exercise 7.77) samples n — 50 customers and if the true fraction 
of customers who dislike the policy is approximately .9, find the probability that the sample 
fraction will be within .15 unit of the true fraction. 


Suppose that a random sample of 25 items is selected from the machine of Exercise 7.72. If 
the machine produces 1046 defectives, find the probability that the sample will contain at least 
two defectives, by using the following methods: 


a The normal approximation to the binomial 


b The exact binomial tables 


The median age of residents of the United States is 31 years. If a survey of 100 randomly 
selected U.S. residents is to be taken, what is the approximate probability that at least 60 will 
be under 31 years of age? 
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A lot acceptance sampling plan for large lots specifies that 50 items be randomly selected and 
that the lot be accepted if no more than 5 of the items selected do not conform to specifications. 


a What is the approximate probability that a lot will be accepted if the true proportion of 
nonconforming items in the lot is .10? 


b Answer the question in part (a) if the true proportion of nonconforming items in the lot is 
.20 and .30. 


The quality of computer disks is measured by the number of missing pulses. Brand X is such 
that 80% of the disks have no missing pulses. If 100 disks of brand X are inspected, what is 
the probability that 15 or more contain missing pulses? 


Applet Exercise Vehicles entering an intersection from the east are equally likely to turn left, 
turn right, or proceed straight ahead. If 50 vehicles enter this intersection from the east, use 
the applet Normal Approximation to Binomial Distribution to find the exact and approximate 
probabilities that 


а 15 or fewer turn right. 
b atleast two-thirds of those in the sample turn. 


Just as the difference between two sample means is normally distributed for large samples, so is 
the difference between two sample proportions. That is, if Y; and Y, are independent binomial 
random variables with parameters (nı, pı) and (n2, p2), respectively, then (Yi /n4) — (Y2/n2) 
is approximately normally distributed for large values of nı and no. 


Y, Y 
b Find „(2 = 2) 


As a check on the relative abundance of certain species of fish in two lakes, п = 50 observations 
are taken on results of net trapping in each lake. For each observation, the experimenter merely 
records whether the desired species was present in the trap. Past experience has shown that this 
species appears in lake A traps approximately 10% of the time and in lake B traps approximately 
20% of the time. Use these results to approximate the probability that the difference between 
the sample proportions will be within .1 of the difference between the true proportions. 


An auditor samples 100 of a firm’s travel vouchers to ascertain what percentage of the whole 
set of vouchers are improperly documented. What is the approximate probability that more 
than 30% of the sampled vouchers are improperly documented if, in fact, only 20% of all the 
vouchers are improperly documented? If you were the auditor and observed more than 30% 
with improper documentation, what would you conclude about the firm’s claim that only 20% 
suffered from improper documentation? Why? 


The times to process orders at the service counter of a pharmacy are exponentially distributed 
with mean 10 minutes. If 100 customers visit the counter in a 2-day period, what is the 
probability that at least half of them need to wait more than 10 minutes? 


Summary 


To make inferences about population parameters, we need to know the probabil- 
ity distributions for certain statistics, functions of the observable random variables 
in the sample (or samples). These probability distributions provide models for the 
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7.88 


Table 7.2 R (and S-Plus) procedures giving probabilities and percentiles for normal, x7, t, and Е 
distributions. 


pth Quantile, 


Distribution P(Y < yo) Pp Such That P(Y < ¢,) = p 
Normal (11,0) pnorm (yo, ш,с) апогт (р, ш,с) 

x? with v df рсһізѕа (уо, v) qchisq(p,v) 

t with v df pt (уо, v) qt (р, у) 

Е with v; num. df, pf (yo, vi, v2) qf (p, v, v2) 


v denom. df 


relative frequency behavior of the statistics in repeated sampling; consequently, they 
are referred to as sampling distributions. We have seen that the normal, x^. t, and 
F distributions provide models for the sampling distributions of statistics used to 
make inferences about the parameters associated with normal distributions. For your 
convenience, Table 7.2 contains a summary of the А (or S-Plus) commands that 
provide probabilities and quantiles associated with these distributions. 

When the sample size is large, the sample mean Y possesses an approximately 
normal distribution if the random sample is taken from any distribution with a finite 
mean и and a finite variance o. This result, known as the central limit theorem, also 
provides the justification for approximating binomial probabilities with corresponding 
probabilities associated with the normal distribution. 

The sampling distributions developed in this chapter will be used in the inference- 
making procedures presented in subsequent chapters. 
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Supplementary Exercises 


The efficiency (in lumens per watt) of light bulbs of a certain type has population mean 9.5 and 
standard deviation .5, according to production specifications. The specifications for a room in 
which eight of these bulbs are to be installed call for the average efficiency of the eight bulbs 
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to exceed 10. Find the probability that this specification for the room will be met, assuming 
that efficiency measurements are normally distributed. 


Refer to Exercise 7.88. What should be the mean efficiency per bulb if the specification for 
the room is to be met with a probability of approximately .80? (Assume that the variance of 
efficiency measurements remains at .5.) 


Briggs and King developed the technique of nuclear transplantation in which the nucleus of 
a cell from one of the later stages of an embryo’s development is transplanted into a zygote 
(a single-cell, fertilized egg) to see if the nucleus can support normal development. If the 
probability that a single transplant from the early gastrula stage will be successful is .65, what 
is the probability that more than 70 transplants out of 100 will be successful? 


A retail dealer sells three brands of automobiles. For brand A, her profit per sale, Х is normally 
distributed with parameters (u1, 02); for brand B her profit per sale Y is normally distributed 
with parameters (u2, 02); for brand C, her profit per sale W is normally distributed with 
parameters (из, 02). For the year, two-fifths of the dealer's sales are of brand A, one-fifth of 
brand B, and the remaining two-fifths of brand C. If you are given data on profits for n, n2, 
and n; sales of brands A, B, and C, respectively, the quantity U = .4X + .2Y + .AW will 
approximate to the true average profit per sale for the year. Find the mean, variance, and 
probability density function for U. Assume that X, Y, and W are independent. 


From each of two normal populations with identical means and with standard deviations of 
6.40 and 7.20, independent random samples of 64 observations are drawn. Find the probability 
that the difference between the means of the samples exceeds .6 in absolute value. 


If Y has an exponential distribution with mean Ө, show that U = 2Y/0 has a x? distribution 
with 2 df. 


A plant supervisor is interested in budgeting weekly repair costs for a certain type of machine. 
Records over the past years indicate that these repair costs have an exponential distribution with 
mean 20 for each machine studied. Let Y;, Y2,..., Ys; denote the repair costs for five of these 
machines for the next week. Find a number c such that P( Е Ү; > c) — .05, assuming that 
the machines operate independently. [Hint: Use the result given in Exercise 7.93.] 


The coefficient of variation (CV) for a sample of values Y;, Y», ..., Y, is defined by 
СУ = S/Y. 


This quantity, which gives the standard deviation as a proportion of the mean, is sometimes 
informative. For example, the value S = 10 has little meaning unless we can compare it to 
something else. If 5 is observed to be 10 and Y is observed to be 1000, the amount of variation is 
small relative to the size of the mean. However, if S is observed to be 10 and Y is observed to be 
5, the variation is quite large relative to the size of the mean. If we were studying the precision 
(variation in repeated measurements) of a measuring instrument, the first case (CV — 10/1000) 
might provide acceptable precision, but the second case (CV — 2) would be unacceptable. 

Let Y;, Y2,..., Yio denote a random sample of size 10 from a normal distribution with 
mean 0 and variance o°. Use the following steps to find the number c such that 


$ 

Р (-« <= < г) = .95. 
Ү 

а Use the result of Exercise 7.33 to find the distribution of aor /S?. 

b Use the result of Exercise 7.29 to find the distribution of 52/ [(10)У^]. 

с Use the answer to (b) to find the constant c. 
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Suppose that Y;, Y, ..., Y49 denote a random sample of measurements on the proportion of 
impurities in iron ore samples. Let each variable Y; have a probability density function given by 


3y, 0xyxl 
ХО) = 


0, elsewhere. 


The ore is to be rejected by the potential buyer if Y exceeds .7. Find P(Y > .7) for the sample 
of size 40. 


Let X,, X2, ..., X, be independent x?-distributed random variables, each with 1 df. Define 
Y as 


It follows from Exercise 6.59 that Y has a x? distribution with n df. 


a Use the preceding representation of Y as the sum of the X's to show that Z = (Y —n)/4 2n 
has an asymptotic standard normal distribution. 


b А machine in a heavy-equipment factory produces steel rods of length Y, where Y is a 
normally distributed random variable with mean 6 inches and variance .2. The cost C of 
repairing a rod that is not exactly 6 inches in length is proportional to the square of the 
error and is given, in dollars, by С = 4(У — j2). If 50 rods with independent lengths are 
produced in a given day, approximate the probability that the total cost for repairs for that 
day exceeds $48. 


Suppose that Т is defined as in Definition 7.2. 


a If W is fixed at w, then T is given by Z/c, where с = ./w/v. Use this idea to find the 
conditional density of T for a fixed W — w. 


Find the joint density of T and W, f(t, w), by using f (t, м) = f(t|w) f бу). 
Integrate over w to show that 

FI 0/21] (, | ? Беи 
At vl (v/2) v 
Suppose F is defined as in Definition 7.3. 


; —00 < f < oo. 


го = | 


а If W» is fixed at w2, Шеп F = №, /с, where с = wavi/v». Find the conditional density of 
Е for fixed И» = wo. 


Find the joint density of F and W;. 
Integrate over w, to show that the probability density function of F—say, g(y)—is given by 


Г >) /2 01/2 r —(vı+v2)/2 
ibis [Qi + v5)/2](v1/v3) oD- [р vıy ‚ болу 
Г(ъ1/2)Г (v2/2) v 


Let X have a Poisson distribution with parameter A. 


a Show that the moment-generating function of Y = (X — A)/A/A is given by 
my(t) = expe — At — А). 


b Use the expansion 
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to show that 


2 
lim my(t) = е. 
А 00 
c Use Theorem 7.5 to show that the distribution function of Y converges to a standard normal 
distribution function as A — oo. 


In the interest of pollution control, an experimenter wants to count the number of bacteria 
per small volume of water. Let X denote the bacteria count per cubic centimeter of water and 
assume that X has a Poisson probability distribution with mean А = 100. If the allowable 
pollution in a water supply is a count of 110 per cubic centimeter, approximate the probability 
that X will be at most 110. [Hint: Use the result in Exercise 7.100(c).] 


Y, the number of accidents per year at a given intersection, is assumed to have a Poisson 
distribution. Over the past few years, an average of 36 accidents per year have occurred at this 
intersection. If the number of accidents per year is at least 45, an intersection can qualify to 
be redesigned under an emergency program set up by the state. Approximate the probability 
that the intersection in question will come under the emergency program at the end of the next 
year. 


An experimenter is comparing two methods for removing bacteria colonies from processed 
luncheon meats. After treating some samples by method A and other identical samples by 
method B, the experimenter selects a 2-cubic-centimeter subsample from each sample and 
makes bacteria colony counts on these subsamples. Let X denote the total count for the sub- 
samples treated by method A and let Y denote the total count for the subsamples treated by 
method B. Assume that X and Y are independent Poisson random variables with means A, 
and i>, respectively. If X exceeds Y by more than 10, method B will be judged superior to A. 
Suppose that, in fact, A; = А = 50. Find the approximate probability that method B will be 
judged superior to method A. 


Let Y, be a binomial random variable with п trials and with success probability p. Suppose 
that л tends to infinity and p tends to zero in such a way that np remains fixed at np = А. Use 
the result in Theorem 7.5 to prove that the distribution of Y, converges to a Poisson distribution 
with mean i. 


If the probability that a person will suffer an adverse reaction from a medication is .001, use 
the result of Exercise 7.104 to approximate the probability that 2 or more persons will suffer 
an adverse reaction if the medication is administered to 1000 individuals. 
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Introduction 


As stated in Chapter 1, the purpose of statistics is to use the information contained 
in a sample to make inferences about the population from which the sample is taken. 
Because populations are characterized by numerical descriptive measures called 
parameters, the objective of many statistical investigations is to estimate the value of 
one or more relevant parameters. As you will see, the sampling distributions derived 
in Chapter 7 play an important role in the development of the estimation procedures 
that are the focus of this chapter. 

Estimation has many practical applications. For example, a manufacturer of wash- 
ing machines might be interested in estimating the proportion p of washers that can 
be expected to fail prior to the expiration of a 1-year guarantee time. Other important 
population parameters are the population mean, variance, and standard deviation. For 
example, we might wish to estimate the mean waiting time ju at a supermarket check- 
out station or the standard deviation of the error of measurement o of an electronic 
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instrument. To simplify our terminology, we will call the parameter of interest in the 
experiment the target parameter. 

Suppose that we wish to estimate the average amount of mercury u that a newly 
developed process can remove from | ounce of ore obtained at a geographic location. 
We could give our estimate in two distinct forms. First, we could use a single number— 
for instance .13 ounce—that we think is close to the unknown population mean p. 
This type of estimate is called a point estimate because a single value, or point, is given 
as the estimate of u. Second, we might say that u will fall between two numbers—for 
example, between .07 and .19 ounce. In this second type of estimation procedure, 
the two values that we give may be used to construct an interval (.07, .19) that is 
intended to enclose the parameter of interest; thus, the estimate is called an interval 
estimate. 

The information in the sample can be used to calculate the value of a point estimate, 
an interval estimate, or both. In any case, the actual estimation is accomplished by 
using an estimator for the target parameter. 


An estimator is a rule, often expressed as a formula, that tells how to calculate 
the value of an estimate based on the measurements contained in a sample. 


For example, the sample mean 


is one possible point estimator of the population mean jz. Clearly, the expression for 
Y is both a rule and a formula. It tells us to sum the sample observations and divide 
by the sample size n. 

An experimenter who wants an interval estimate of a parameter must use the 
sample data to calculate two values, chosen so that the interval formed by the two 
values includes the target parameter with a specified probability. Examples of interval 
estimators will be given in subsequent sections. 

Many different estimators (rules for estimating) may be obtained for the same 
population parameter. This should not be surprising. Ten engineers, each assigned to 
estimate the cost of a large construction job, could use different methods of estimation 
and thereby arrive at different estimates of the total cost. Such engineers, called 
estimators in the construction industry, base their estimates on specified fixed guide- 
lines and intuition. Each estimator represents a unique human subjective rule for 
obtaining a single estimate. This brings us to a most important point: Some estima- 
tors are considered good, and others, bad. The management of a construction firm 
must define good and bad as they relate to the estimation of the cost of a job. How 
can we establish criteria of goodness to compare statistical estimators? The following 
sections contain some answers to this question. 
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FIGURE 8.1 
A distribution 
of estimates 


FIGURE 8.2 
Sampling distribution 
for a positively 
biased estimator 


The Bias and Mean Square Error 
of Point Estimators 


Point estimation is similar, in many respects, to firing a revolver at a target. The 
estimator, generating estimates, is analogous to the revolver; a particular estimate is 
comparable to one shot; and the parameter of interest corresponds to the bull’s-eye. 
Drawing a single sample from the population and using it to compute an estimate for 
the value of the parameter corresponds to firing a single shot at the bull’s-eye. 

Suppose that a man fires a single shot at a target and that shot pierces the bull’s- 
eye. Do we conclude that he is an excellent shot? Would you want to hold the target 
while a second shot is fired? Obviously, we would not decide that the man is an expert 
marksperson based on such a small amount of evidence. On the other hand, if 100 shots 
in succession hit the bull’s-eye, we might acquire sufficient confidence in the marks- 
person and consider holding the target for the next shot if the compensation was 
adequate. The point is that we cannot evaluate the goodness of a point estimation 
procedure on the basis of the value of a single estimate; rather, we must observe 
the results when the estimation procedure is used many, many times. Because the 
estimates are numbers, we evaluate the goodness of the point estimator by constructing 
a frequency distribution of the values of the estimates obtained in repeated sampling 
and note how closely this distribution clusters about the target parameter. 

Suppose that we wish to specify a point estimate for a population parameter that 
we will call 0. The estimator of 0 will be indicated by the symbol Ó, read as “0 hat.” 
The “hat” indicates that we are estimating the parameter immediately beneath it. 
With the revolver-firing example in mind, we can say that it is highly desirable for 
the distribution of estimates—or, more properly, the sampling distribution of the 
estimator—to cluster about the target parameter as shown in Figure 8.1. In other 
words, we would like the mean or expected value of the distribution of estimates to 
equal the parameter estimated; that is, E(0) = Ө. Point estimators that satisfy this 
property are said to be unbiased. The sampling distribution for a positively biased 
point estimator, one for which Е (б) > 0, is shown in Figure 8.2. 
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DEFINITION 8.2 Let 6 be a point estimator for a parameter 0. Then Ó is an unbiased estimator 
if E(6) = 0. If E (Ô) Z 0, Ô is said to be biased. 


DEFINITION 8.3 The bias of a point estimator Ó is given by B(Ó) = Е (8) — 0. 


Figure 8.3 shows two possible sampling distributions for unbiased point estima- 
tors for a target parameter 0. We would prefer that our estimator have the type of 
distribution indicated in Figure 8.3(b) because the smaller variance guarantees that 
in repeated sampling a higher fraction of values of 05 will be “close” to 0. Thus, 
in addition to preferring unbiasedness, we want the variance of the distribution of 
the estimator V (д) to be as small as possible. Given two unbiased estimators of a 
parameter 0, and all other things being equal, we would select the estimator with the 
smaller variance. 

Rather than using the bias and variance of a point estimator to characterize its 
goodness, we might employ E[(Ó — 0)?], the average of the square of the distance 
between the estimator and its target parameter. 


DEFINITION 8.4 The mean square error of a point estimator Ó is 
MSE(6) = E[(Ó — 0)2]. 


The mean square error of an estimator 6 А MSE(Ó), is a function of both its variance 
and its bias. If B(0) denotes the bias of the estimator 0, it can be shown that 


MSE(0) = VÔ) + [B@)P. 


We will leave the proof of this result as Exercise 8.1. 

In this section, we have defined properties of point estimators that are some- 
times desirable. In particular, we often seek unbiased estimators with relatively small 
variances. In the next section, we consider some common and useful unbiased point 
estimators. 


FIGURE 8.3 уб) КУ 
Sampling 
distributions for two 
unbiased estimators: 
(a) estimator with 
large variation; 

(b) estimator with 
small variation 
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8.4 
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8.6 


8.7 


8.8 


Exercises 


Using the identity 
(0 — 6) = [6 — E@)] + [E Ô) — 0] = [6 — E@)] + BO), 
show that 
MSE(0) = E[(@ — 6)?] = V Ê) + (В(0))>. 
а If is an unbiased estimator for 0, what is B(0)? 
b If B(0) = 5, what is Е(0)? 


Suppose that д is an estimator for a parameter 0 and E (Ê) = a0 + b for some nonzero constants 
a and b. 
a Interms of a, b, and 0, what is B(0)? 


b Find a function of —say, 6*—that is an unbiased estimator for 0. 
Refer to Exercise 8.1. 


a IfÔ is an unbiased estimator for Ө, how does MSE(6) compare to v(6)? 
b IfÓ is an biased estimator for Ө, how does MSE(6) compare to V (6)? 


Refer to Exercises 8.1 and consider the unbiased estimator 6* that you proposed in 
Exercise 8.3. 


a Express MSE(6*) as a function of V (0). 

b Give an example of a value of a for which MSE(6*) < MSE(6). 

c Give an example of values for a and b for which MSE(6*) > MSE(Ó). 

Suppose that E(0,) = Е(0,) = 0, V(0,) = о?, and У(0,) = o2. Consider the estimator 
6; = ab, + (1 — а). 

a Show that Ó; is an unbiased estimator for Ө. 


b If 6, and б» are independent, how should the constant а be chosen in order to minimize 
the variance of 63? 


Consider the situation described in Exercise 8.6. How should the constant a be chosen to 
minimize the variance of Өз if Ө, and Ө» are not independent but are such that Cov(6,, 05) = 
c #0? 


Suppose that Ү,, Y2, Y3 denote a random sample from an exponential distribution with density 
function 


fo) = (se iii 


0, elsewhere. 


Consider the following five estimators of 0: 


Y, 4 Y; g, h2h 
3-7 TUUS 


a Which of these estimators are unbiased? 


à-Y, ô= 00 0,— min(Y,, №, Y3), 


b Among the unbiased estimators, which has the smallest variance? 
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Suppose that У, Y5,..., Y, constitute a random sample from a population with probability 
density function 


1 
РО) = Cs 


0, elsewhere. 


jest у> 0,0 > –1, 


Suggest a suitable statistic to use as an unbiased estimator for 0. [Hint: Consider Y.] 


The number of breakdowns per week for a type of minicomputer is a random variable Y with 

a Poisson distribution and mean л. A random sample Y;, Ү,..., Y, of observations on the 

weekly number of breakdowns is available. 

a Suggest an unbiased estimator for À. 

b The weekly cost of repairing these breakdowns is C = 3Y --Y?. Show that E(C) = 44 + А2. 

c Find a function of Y;, Y2,..., Y, that is an unbiased estimator of E(C). [Hint: Use what 
you know about Y and (У)?.] 


Let Y;, Y2, ..., Y, denote a random sample of size п from a population with mean 3. Assume 
that бә is an unbiased estimator of E(Y?) and that 63 is an unbiased estimator of E(Y?). Give 
an unbiased estimator for the third central moment of the underlying distribution. 


The reading on a voltage meter connected to a test circuit is uniformly distributed over the 
interval (0, 0 -- 1), where 0 is the true but unknown voltage of the circuit. Suppose that 
Yi, Y2,..., Y, denote a random sample of such readings. 

a Show that Y is a biased estimator of 6 and compute the bias. 

b Find a function of Y that is an unbiased estimator of Ө. 

c Find MSE(Y) when Y is used as an estimator of Ө. 

We have seen that if Y has a binomial distribution with parameters n and p, then Y/n is an 
unbiased estimator of p. To estimate the variance of Y, we generally use n(Y /n)(1 — Y/n). 
a Show that the suggested estimator is a biased estimator of V (Y). 

b Modify n(Y/n)(1 — Y/n) slightly to form an unbiased estimator of V (Y). 


Let Y;, Y2,..., Y, denote a random sample of size n from a population whose density is given by 
0—1 a 
ay" /0^, O<y <8, 
ХО) = | 
0, elsewhere, 


where o > 0 is a known, fixed value, but 0 is unknown. (This is the power family distribution 
introduced in Exercise 6.17.) Consider the estimator д = max(Y;, Ү,..., Yn). 


a Show that 0 is a biased estimator for Ө. 
b Find a multiple of Ó that is an unbiased estimator of 0. 
c Derive МЅЕ(0). 


Let Y;, Y2,..., Y, denote a random sample of size п from a population whose density is given by 
3P8y^, By, 
Jue Í 
0, elsewhere, 


where В > 0 is unknown. (This is one of the Pareto distributions introduced in Exercise 6.18.) 
Consider the estimator f =min(Y, Yo, ..., №). 


a Derive the bias of the estimator Ё. 
b Derive MSE(f). 
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Suppose that Y;, Y2,..., Y, constitute a random sample from a normal distribution with 
parameters и and o?.! 


a Show that S = У S? is abiased estimator of o. [Hint: Recall the distribution of (n — 1) 52/02 
and the result given in Exercise 4.112.] 


Adjust S to form an unbiased estimator of c. 


Find an unbiased estimator of и — zao , the point that cuts off a lower-tail area of о under 
this normal curve. 


If Y has a binomial distribution with parameters n and p, then p, = Y/n is an unbiased 
estimator of p. Another estimator of p 15 p» = (Y + 1)/(n + 2). 


a Derive the bias of f». 
b Derive MSE(fi) and MSE(f»). 
с For what values of p is MSE(f1) < MSE(p2)? 


Let Y;, Y2, ..., Y, denote a random sample of size n from a population with a uniform distri- 
bution on the interval (0, Ө). Consider Ya) = min(Y;, Yo, ..., Yn), the smallest-order statistic. 
Use the methods of Section 6.7 to derive E(Y(). Find a multiple of Ya) that is an unbiased 
estimator for 0. 


Suppose that Yı, Y2,..., Y, denote a random sample of size n from a population with an 


exponential distribution whose density is given by 


| [Q/0e, у> 0, 


1 elsewhere. 


РО) 


If Ya) = min(Y;, Y2,..., Y,) denotes the smallest-order statistic, show that à = nYa) is an 
unbiased estimator for 6 and find MSE(Ó). [Hint: Recall the results of Exercise 6.81.] 


Suppose that Y;, Y2, Үз, Y, denote a random sample of size 4 from a population with an 
exponential distribution whose density is given by 


(1/a)e®, у> 0, 


0, elsewhere. 


f(y) | 


a Let X = уҮ, №. Find a multiple of X that is an unbiased estimator for 0. [Hint: Use your 
knowledge of the gamma distribution and the fact that Г(1/2) = 4/7 to find Е(/Ү,). 
Recall that the variables Y; are independent.] 


b Let = у, №. Find a multiple of W that is an unbiased estimator for 0?. [Recall 
the hint for part (a).] 


Some Common Unbiased Point Estimators 


Some formal methods for deriving point estimators for target parameters are presented 
in Chapter 9. In this section, we focus on some estimators that merit consideration 
on the basis of intuition. For example, it seems natural to use the sample mean 


1. Exercises preceded by an asterisk are optional. 
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Y to estimate the population mean ш and to use the sample proportion р = Y/n 
to estimate a binomial parameter p. If an inference is to be based on independent 
random samples of n, and n2 observations selected from two different populations, 
how would we estimate the difference between means (ш — u2) or the difference in 
two binomial parameters, (p; — р»)? Again, our intuition suggests using the point 
estimators (У — Y2), the difference in the sample means, to estimate (и — ио) and 
using (P; — ро), the difference in the sample proportions, to estimate (p; — p2). 
Because the four estimators Y, P. (Y, — Y5), and ( Рі — рә) are functions of 
the random variables observed in samples, we can find their expected values and 
variances by using the expectation theorems of Sections 5.6—5.8. The standard devi- 
ation of each of the estimators is simply the square root of the respective variance. 
Such an effort would show that, when random sampling has been employed, all four 
point estimators are unbiased and that they possess the standard deviations shown in 
Table 8.1. To facilitate communication, we use the notation o to denote the variance 
of the sampling distribution of the estimator Ê. The standard deviation of the sampling 
distribution of the estimator б, ов = о]. is usually called the standard error of the 


estimator б. 

In Chapter 5, we did much of the derivation required for Table 8.1. In particular, we 
found the means and variances of Y and f in Examples 5.27 and 5.28, respectively. 
If the random samples are independent, these results and Theorem 5.12 imply that 


E(Y; — Y2) = E(Y1) — Е(Ү) = ш — ио, 
2 2 


01 05 
ПІ n2 


The expected value and standard error of (ру — P5), shown in Table 8.1, can be 
acquired similarly. 


Table 8.1 Expected values and standard errors of some common point estimators 


Target Point Standard 
Parameter Sample Estimator Error 
0 Size(s) ô E(6) оў 
y с 
п ] HR 
ш ГА п 
EA pq 
р п pu р = 
п п 
E 
2 2 
wo vV 91 05 
Ш — Шо n, and nz Y; — Y5 Ha — Шо — + 
пр оп» 
^ ^ AUI P242 
Pi = p n, and n; Pi = Po Pi p + 
nı пә 


*o? and ož are the variances of populations 1 and 2, respectively. 


The two samples are assumed to be independent. 
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Although unbiasedness is often a desirable property for a point estimator, not all 
estimators are unbiased. In Chapter 1, we defined the sample variance 


i= Fy 


п— 1 


52 = 


It probably seemed more natural to divide by п than by n — 1 in the preceding 
expression and to calculate 


2 rad Xy 


n 


52 = 


Example 8.1 establishes that S? and S? are, respectively, biased and unbiased estima- 
tors of the population variance o?. We initially identified S? as the sample variance 
because it is an unbiased estimator. 


EXAMPLE 8.1 


Solution 


Let Y1, Y2, ..., Y, be a random sample with E(Y;) = и and V(Y;) = o?. Show that 
1 п БЕ 
5/2 Eme Yi E Y p 
- »: ) 


is a biased estimator for o? and that 


1 n = 
52 = Y; – YY 
p ) 


is an unbiased estimator for o?. 


It can be shown (see Exercise 1.9) that 
Э – ү) = Sr- - (5x) - Y x-aY. 
i=l i=l i=l 
Hence, 
E Р — » = Е (> r) -nE ) = У`к@?) — nE(Y)). 
i=l i=l i=l 


Notice that E(Y2) is the same for i = 1,2,...,n. We use this and the fact that the 
variance of a random variable is given by V ( d = Е (Y?) —[E(Y ур to conclude that 
EOP) = VY) + [EQ@)P = o? + w, E(Y ) = VO) + [EQ = 0? /n + и?, 
and that 


n n 2 
E » = Д = Уо? +) п (= + iP) 
і=1 i=1 


g? 
= n(o? + и?) = п (= + iP) 


= по? – о? = (п — Do?. 
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It follows that 
1 : — 1 —1 
Е($?у = —E bay B " =-(n- 10° = (=) о? 
no = п п 


and that S? is biased because Е (52) 4 o?. However, 


1 n a 1 А 
ES) = ШИЕ he Р? — Д Бе 1” — Do? =0°, 


ї=1 


so we see that S? is an unbiased estimator for o?. Ini 


9.4 


Two final comments can be made concerning the point estimators of Table 8.1. 
First, the expected values and standard errors for Y and Y, — Y; given in the table 
are valid regardless of the distribution of the population(s) from which the sample(s) 
is (are) taken. Second, all four estimators possess probability distributions that are 
approximately normal for large samples. The central limit theorem justifies this state- 
ment for Y and f, and similar theorems for functions of sample means justify the 
assertion for (Y, — Y>) and ( Р, — рә). How large is "large"? For most populations, 
the probability distribution of Y is mound-shaped even for relatively small samples 
(as low as n — 5), and will tend rapidly to normality as the sample size approaches 
n — 30 or larger. However, you sometimes will need to select larger samples from 
binomial populations because the required sample size depends on p. The binomial 
probability distribution is perfectly symmetric about its mean when p — 1/2 and 
becomes more and more asymmetric as p tends to 0 or 1. As a rough rule, you can 
assume that the distribution of р will be mound-shaped and approaching normality for 
sample sizes such that p +3./pq/n lies in the interval (0, 1), or, as you demonstrated 
in Exercise 7.70, if n > 9 (larger of p and q)/(smaller of p and д). 

We know that Y, P. (Y; —Y5),and( P1— P2) are unbiased with near-normal (at least 
mound-shaped) sampling distributions for moderate-sized samples; now let us use 
this information to answer some practical questions. If we use an estimator once and 
acquire a single estimate, how good will this estimate be? How much faith can we 
place in the validity of our inference? The answers to these questions are provided in 
the next section. 


Evaluating the Goodness 
of a Point Estimator 


One way to measure the goodness of any point estimation procedure is in terms of 
the distances between the estimates that it generates and the target parameter. This 
quantity, which varies randomly in repeated sampling, is called the error of estimation. 
Naturally we would like the error of estimation to be as small as possible. 
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DEFINITION 8.5 


FIGURE 8.4 
Sampling distribution 
of a point estimator Ө 


The error of estimation є is the distance between an estimator and its target 
parameter. That is, € = |0 — 0|. 


Because 6 is a random variable, the error of estimation is also a random quantity, 
and we cannot say how large or small it will be for a particular estimate. However, we 
can make probability statements about it. For example, suppose that 6 is an unbiased 
estimator of Ө and has a sampling distribution as shown in Figure 8.4. If we select 
two points, (0 — b) and (0 + D), located near the tails of the probability density, the 
probability that the error of estimation e is less than b is represented by the shaded 
area in Figure 8.4. That is, 


P(\6 —6| < b) = P[-b < (0 —0) <b] = P( b <6 < 0 + b). 


We can think of Р as a probabilistic bound on the error of estimation. Although 
we are not certain that a given error is less than b, Figure 8.4 indicates that P(e < b) 
is high. If b can be regarded from a practical point of view as small, then P(e « b) 
provides a measure of the goodness of a single estimate. This probability identifies 
the fraction of times, in repeated sampling, that the estimator Ó falls within b units of 
0, the target parameter. 

Suppose that we want to find the value of b so that P(e « b) = .90. This is easy 
if we know the probability density function of д. Then we seek a value b such that 

O+b 
f (6) ад = .90. 
0—b 
But whether we know the probability distribution of Ó or not, if 0 is unbiased we can 
find an approximate bound on є by expressing b as a multiple of the standard error 
of Ó (recall that the standard error of an estimator is simply a convenient alternative 
name for the standard deviation of the estimator). For example, for k > 1, if we let 
b = kog, we know from Tchebysheff’s theorem that ¢ will be less than kog with 
probability at least 1 — 1/2. A convenient and often-used value of k is k = 2. Hence, 
we know that = will be less than b = 204 with probability at least .75. 

You will find that, with a probability in the vicinity of .95, many random variables 

observed in nature lie within 2 standard deviations of their mean. The probability 


КО) 


P(e <b) 


(0 — b) (0 +b) 6 


Ө 
Kb е 5 
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Table 8.2 Probability that (4—20) < Y < (p.420) 


Distribution Probability 
Normal 9544 
Uniform 1.0000 
Exponential 9502 


that Y lies in the interval (и + 20) is shown in Table 8.2 for the normal, uniform, 
and exponential probability distributions. The point is that р = 209 is a good ap- 
proximate bound on the error of estimation in most practical situations. According to 
Tchebysheff’s theorem, the probability that the error of estimation will be less than 
this bound is at least .75. As we have previously observed, the bounds for proba- 
bilities provided by Tchebysheff’s theorem are usually very conservative; the actual 
probabilities usually exceed the Tchebysheff bounds by a considerable amount. 


EXAMPLE 8.2 


Solution 


A sample of n = 1000 voters, randomly selected from a city, showed y = 560 in 
favor of candidate Jones. Estimate p, the fraction of voters in the population favoring 
Jones, and place a 2-standard-error bound on the error of estimation. 


We will use the estimator р = Y/n to estimate p. Hence, the estimate of p, the 
fraction of voters favoring candidate Jones, is 
y 560 — 


B—,71907 


How much faith can we place in this value? The probability distribution of р is 
very accurately approximated by a normal probability distribution for large samples. 
Since n = 1000, when b = 20р, the probability that = will be less than b is approx- 
imately .95. 

From Table 8.1, the standard error of the estimator for p is given by ср = ./pq/n. 


Therefore, 
b = 20; = 2| P1., 
n 


Unfortunately, to calculate b, we need to know p, and estimating p was the objective 
of our sampling. This apparent stalemate is not a handicap, however, because o; varies 
little for small changes in p. Hence, substitution of the estimate p for p produces 
little error in calculating the exact value of b = 20р. Then, for our example, we have 


56)(44 
реа 20 LL мез В, 
n 1000 


What is the significance of our calculations? The probability that the error of 
estimation is less than .03 is approximately .95. Consequently, we can be reasonably 
confident that our estimate, .56, is within .03 of the true value of p, the proportion of 
voters in the population who favor Jones. 


56. 
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EXAMPLE 8.3 


Solution 


8.21 


A comparison of the durability of two types of automobile tires was obtained by road 
testing samples of n, = nz = 100 tires of each type. The number of miles until 
wear-out was recorded, where wear-out was defined as the number of miles until the 
amount of remaining tread reached a prespecified small value. The measurements 
for the two types of tires were obtained independently, and the following means and 
variances were computed: 


y, — 26,400 miles, y» = 25,100 miles, 
52 = 1,440,000, 52 — 1,960,000. 


Estimate the difference in mean miles to wear-out and place a 2-standard-error bound 
on the error of estimation. 


The point estimate of (441 — 142) is 
(у, = У) = 26,400 — 25,100 = 1300 miles, 


and the standard error of the estimator (see Table 8.1) is 


oi 03 
O(y,-y,) = К 
П] n» 


We must know oi and оў, or have good approximate values for them, to calculate 
Oy, y, Fairly accurate values of c? and o2 often can be calculated from similar 
experimental data collected at some prior time, or they can be obtained from the 
current sample data by using the unbiased estimators 


e es 

nj = 

These estimates will be adequate if the sample sizes are reasonably large—say, 

n; > 30—for i = 1, 2. The calculated values of 8T and 52, based on the two wear 

tests, are s? = 1,440,000 and 82 = 1,960,000. Substituting these values for o? and 
а2 in the formula for Oy, y, We have 


| Jag Е o2 |s Е s _ с Баою 1,960,000 
(=) = "hg Pun 100 100 


= y 34,000 = 184.4 miles. 


Consequently, we estimate the difference in mean wear to be 1300 miles, and we 
expect the error of estimation to be less than 20у, _у,), or 368.8 miles, with a proba- 
bility of approximately .95. а 


Exercises 


An investigator is interested in the possibility of merging the capabilities of television and the 
Internet. A random sample of n = 50 Internet users yielded that the mean amount of time spent 
watching television per week was 11.5 hours and that the standard deviation was 3.5 hours. 
Estimate the population mean time that Internet users spend watching television and place a 
bound on the error of estimation. 
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An increase in the rate of consumer savings frequently is tied to a lack of confidence in the 
economy and is said to be an indicator of a recessional tendency in the economy. A random 
sampling of n = 200 savings accounts in a local community showed the mean increase in 
savings account values to be 7.2% over the past 12 months, with standard deviation 5.6%. 
Estimate the mean percentage increase in savings account values over the past 12 months for 
depositors in the community. Place a bound on your error of estimation. 


The Environmental Protection Agency and the University of Florida recently cooperated in a 
large study of the possible effects of trace elements in drinking water on kidney-stone disease. 
The accompanying table presents data on age, amount of calcium in home drinking water 
(measured in parts per million), and smoking activity. These data were obtained from individ- 
uals with recurrent kidney-stone problems, all of whom lived in the Carolinas and the Rocky 
Mountain states. 


Carolinas Rockies 


Sample size 467 191 
Mean age 45.1 46.4 
Standard deviation of age 10.2 9.8 
Mean calcium component (ppm) 11.3 40.1 
Standard deviation of calcium 16.6 28.4 
Proportion now smoking 78 61 


a Estimate the average calcium concentration in drinking water for kidney-stone patients in 
the Carolinas. Place a bound on the error of estimation. 

b Estimate the difference in mean ages for kidney-stone patients in the Carolinas and in the 
Rockies. Place a bound on the error of estimation. 

c Estimate and place a 2-standard-deviation bound on the difference in proportions of 
kidney-stone patients from the Carolinas and Rockies who were smokers at the time of 
the study. 


Text not available due to copyright restrictions 


A study was conducted to compare the mean number of police emergency calls per 8-hour shift 
in two districts of a large city. Samples of 100 8-hour shifts were randomly selected from the 
police records for each of the two regions, and the number of emergency calls was recorded 
for each shift. The sample statistics are given in the following table. 


Region 
1 2 
Sample size 100 100 
Sample mean 2.4 3.1 


Sample variance 1.44 2.64 


Text not available due to copyright restrictions 
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a Estimate the difference in the mean number of police emergency calls per 8-hour shift 
between the two districts in the city. 


b Find a bound for the error of estimation. 


The Mars twin rovers, Spirit and Opportunity, which roamed the surface of Mars in the winter 
of 2004, found evidence that there was once water on Mars, raising the possibility that there 
was once life on the plant. Do you think that the United States should pursue a program to send 
humans to Mars? An opinion poll? indicated that 49% of the 1093 adults surveyed think that 
we should pursue such a program. 


a Estimate the proportion of all Americans who think that the United States should pursue a 
program to send humans to Mars. Find a bound on the error of estimation. 

b The poll actually asked several questions. If we wanted to report an error of estimation that 
would be valid for all of the questions on the poll, what value should we use? [Hint: What 
is the maximum possible value for p x q?] 


A random sample of 985 "likely voters"—those who are judged to be likely to vote in an 
upcoming election—were polled during a phone-athon conducted by the Republican Party. Of 
those contacted, 592 indicated that they intended to vote for the Republican running in the 
election. 


a According to this study, the estimate for p, the proportion of all “likely voters" who will 
vote for the Republican candidate, is p — .601. Find a bound for the error of estimation. 

b Ifthe "likely voters" are representative of those who will actually vote, do you think that 
the Republican candidate will be elected? Why? How confident are you in your decision? 


c Canyouthink of reasons that those polled might not be representative of those who actually 
vote in the election? 


In a study of the relationship between birth order and college success, an investigator found 
that 126 in a sample of 180 college graduates were firstborn or only children; in a sample of 
100 nongraduates of comparable age and socioeconomic background, the number of firstborn 
or only children was 54. Estimate the difference in the proportions of firstborn or only children 
for the two populations from which these samples were drawn. Give a bound for the error of 
estimation. 


Sometimes surveys provide interesting information about issues that did not seem to be the 
focus of survey initially. Results from two CNN/USA Today/Gallup polls, one conducted in 
March 2003 and one in November 2003, were recently presented online.* Both polls involved 
samples of 1001 adults, aged 18 years and older. In the March sample, 45% of those sampled 
claimed to be fans of professional baseball whereas 51% of those polled in November claimed 
to be fans. 


a Give a point estimate for the difference in the proportions of Americans who claim to be 
baseball fans in March (at the beginning of the season) and November (after the World 
Series). Provide a bound for the error of estimation. 


b Isthere sufficient evidence to conclude that fan support is greater at the end of the season? 
Explain. 


3. Source: “Space Exploration,” Associated Press Poll, http:www.pollingreport.com/science.htmz'Space, 
5 April 2004. 

4. Source: Mark Gillespie," Baseball Fans Overwhelmingly Want Mandatory Steroid Testing,” http:www. 
gallup.com/content/print/.aspx?ci=11245, 14 February 2004. 
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Refer to Exercise 8.29. Give the point estimate and a bound on the error of estimation for the 
proportion of adults who would have claimed to be baseball fans in March 2003. Is it likely 
that the value of your estimate is off by as much as 10%? Why? 


In a study to compare the perceived effects of two pain relievers, 200 randomly selected 
adults were given the first pain reliever, and 93% indicated appreciable pain relief. Of the 450 
individuals given the other pain reliever, 96% indicated experiencing appreciable relief. 


a Give an estimate for the difference in the proportions of all adults who would indicate 
perceived pain relief after taking the two pain relievers. Provide a bound on the error of 
estimation. 


b Based on your answer to part (a), is there evidence that proportions experiencing relief 
differ for those who take the two pain relievers? Why? 


An auditor randomly samples 20 accounts receivable from among the 500 such accounts of a 
client's firm. The auditor lists the amount of each account and checks to see if the underlying 
documents comply with stated procedures. The data are recorded in the accompanying table 
(amounts are in dollars, Y — yes, and N — no). 


Account Amount Compliance Account Amount Compliance 


1 278 Ү 11 188 М 
2 192 Y 12 212 N 
3 310 Y 13 92 Y 
4 94 N 14 56 Y 
5 86 Y 15 142 X 
6 335 Y 16 37 Y 
7 310 N 17 186 N 
8 290 Y 18 221 Y 
9 221 Y 19 219 М 
10 168 Ү 20 305 Ү 


Estimate the total accounts receivable for the 500 accounts of the firm and place a bound on 
the error of estimation. Do you think that the average account receivable for the firm exceeds 
$250? Why? 


Refer to Exercise 8.32. From the data given on the compliance checks, estimate the proportion 
of the firm’s accounts that fail to comply with stated procedures. Place a bound on the error of 
estimation. Do you think that the proportion of accounts that comply with stated procedures 
exceeds 80%? Why? 


We can place a 2-standard-deviation bound on the error of estimation with any estimator for 
which we can find a reasonable estimate of the standard error. Suppose that Y;, Y2,..., Y, 
represent a random sample from a Poisson distribution with mean A. We know that V (Y;) = A, 
and hence E(Y) = A and V (Y) = A/n. How would you employ Y;, У, ..., Y, to estimate A? 
How would you estimate the standard error of your estimator? 


Refer to Exercise 8.34. In polycrystalline aluminum, the number of grain nucleation sites 
per unit volume is modeled as having a Poisson distribution with mean A. Fifty unit-volume 
test specimens subjected to annealing under regime А produced an average of 20 sites per 
unit volume. Fifty independently selected unit-volume test specimens subjected to annealing 
regime B produced an average of 23 sites per unit volume. 
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a Estimate the mean number Ад of nucleation sites for regime A and place a 2-standard-error 
bound on the error of estimation. 

b Estimate the difference in the mean numbers of nucleation sites Ад — Ав for regimes A 
and B. Place a 2-standard-error bound on the error of estimation. Would you say that regime 
B tends to produce a larger mean number of nucleation sites? Why? 


If Yi, Yo,..., Y, denote a random sample from an exponential distribution with mean Ө, then 
E(Y;) = 0 and V(Y;) = 6. Thus, E(Y) = 0 and V (Y) = 0? /n, or oy = 0/A/n. Suggest an 
unbiased estimator for 6 and provide an estimate for the standard error of your estimator. 


Refer to Exercise 8.36. An engineer observes n — 10 independent length-of-life measurements 
on a type of electronic component. The average of these 10 measurements is 1020 hours. If 
these lengths of life come from an exponential distribution with mean Ө, estimate 0 and place 
a 2-standard-error bound on the error of estimation. 


The number of persons coming through a blood bank until the first person with type A blood 
is found is a random variable Y with a geometric distribution. If p denotes the probability 
that any one randomly selected person will possess type A blood, then E(Y) — 1/p and 
VY) =A- p)/p?. 


a Finda function of Y that is an unbiased estimator of V (Y). 


b Suggest how to form a 2-standard-error bound on the error of estimation when Y is used 
to estimate 1/p. 


Confidence Intervals 


An interval estimator is a rule specifying the method for using the sample measure- 
ments to calculate two numbers that form the endpoints of the interval. Ideally, the 
resulting interval will have two properties: First, it will contain the target parameter 0; 
second, it will be relatively narrow. One or both of the endpoints of the interval, being 
functions of the sample measurements, will vary randomly from sample to sample. 
Thus, the length and location of the interval are random quantities, and we cannot 
be certain that the (fixed) target parameter Ө will fall between the endpoints of any 
single interval calculated from a single sample. This being the case, our objective is 
to find an interval estimator capable of generating narrow intervals that have a high 
probability of enclosing 0. 

Interval estimators are commonly called confidence intervals. The upper and lower 
endpoints of a confidence interval are called the upper and lower confidence lim- 
its, respectively. The probability that a (random) confidence interval will enclose Ө 
(a fixed quantity) is called the confidence coefficient. From a practical point of view, 
the confidence coefficient identifies the fraction of the time, in repeated sampling, 
that the intervals constructed will contain the target parameter Ө. If we know that 
the confidence coefficient associated with our estimator is high, we can be highly 
confident that any confidence interval, constructed by using the results from a single 
sample, will enclose 6. 

Suppose that д; and 6y are the (random) lower and upper confidence limits, 
respectively, for a parameter 0. Then, if 


Р (6, x0 < д0) 21— o, 
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the probability (1 — о) is the confidence coefficient. The resulting random interval 
defined by [ô 1.0 u] is called a two-sided confidence interval. 
It is also possible to form a one-sided confidence interval such that 


P (6, <0) =1-a. 


Although only 0, is random in this case, the confidence interval is [Ó ; , оо). Similarly, 
we could have an upper one-sided confidence interval such that 


P(0 < д0) = 1-а. 


The implied confidence interval here is (—оо, 6 ul. 

One very useful method for finding confidence intervals is called the pivotal 
method. This method depends on finding a pivotal quantity that possesses two char- 
acteristics: 


1. Itis a function of the sample measurements and the unknown parameter 0, 
where 0 is the only unknown quantity. 
2. Its probability distribution does not depend on the parameter 0. 


If the probability distribution of the pivotal quantity is known, the following logic can 
be used to form the desired interval estimate. If Y is any random variable, c > 0 is a 
constant, and P(a < Y < b) = .7; then certainly P(ca < сҮ < cb) = .7. Similarly, 
for any constant d, P(a +d < Y +d < b + d) = .7. That is, the probability of the 
event (a < Y < b) is unaffected by a change of scale or a translation of Y. Thus, 
if we know the probability distribution of a pivotal quantity, we may be able to use 
operations like these to form the desired interval estimator. We illustrate this method 
in the following examples. 


EXAMPLE 8.4 


Solution 


Suppose that we are to obtain a single observation Y from an exponential distribution 
with mean 0. Use Y to form a confidence interval for 0 with confidence coefficient .90. 


The probability density function for Y is given by 


1 


0, elsewhere. 


By the transformation method of Chapter 6 we can see that U = Y /0 has the expo- 
nential density function given by 


e". и> 0, 


(и) = | 
fu 0, elsewhere. 

The density function for U is graphed in Figure 8.5. U = Y /0 is a function of Y 
(the sample measurement) and Ө, and the distribution of U does not depend on 0. Thus, 
we can use U = Y/0 as a pivotal quantity. Because we want an interval estimator 
with confidence coefficient equal to .90, we find two numbers a and b such that 


P(a < U < b) = .90. 
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FIGURE 8.5 
Density function for 
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One way to do this is to choose a and b to satisfy 
оо 


P(U <0= | е“ аи = .05 and РФ >= | е "du = .05. 
0 b 


These equations yield 
1—e67—.05 and е? = .05 or, equivalently, а = .051, Ь = 2.996. 
It follows that 


.90 = P(.051 < U < 2.996) = P (os < 


Because we seek an interval estimator for Ө, let us manipulate the inequalities 
describing the event to isolate 0 in the middle. Y has an exponential distribution, so 
P(Y > 0) = 1, and we maintain the direction of the inequalities if we divide through 
by Y. That is, 


Y 051 od: 2 
.90 = Р | .051 < — < 2.996] = P = SS > . 
Ө Yy 0 Y 


Taking reciprocals (and hence reversing the direction of the inequalities), we obtain 


Y Y Y Y 
90 = P| —— > 0 > —— |= P| —— <0 = —— |. 
( .051 sx) (узы ‚051 ) 


Thus, we see that Y /2.996 and Y /.051 form the desired lower and upper confidence 
limits, respectively. To obtain numerical values for these limits, we must observe an 
actual value for Y and substitute that value into the given formulas for the confidence 
limits. We know that limits of the form (Y/2.996, Y /.051) will include the true 
(unknown) values of Ө for 90% of the values of Y we would obtain by repeatedly 
sampling from this exponential distribution. 


EXAMPLE 8.5 


Solution 


Suppose that we take a sample of size n — 1 from a uniform distribution defined on 
the interval [0, 0], where Ө is unknown. Find a 95% lower confidence bound for 0. 


Because Y is uniform on [0, 0], the methods of Chapter 6 can be used to show that 
U — Y/0 is uniformly distributed over [0, 1]. That is, 


fo) = 1, Oxucxl, 
Pe 0, elsewhere. 


FIGURE 8.6 
Density function for 
U, Example 8.5 


Exercises 409 


fu) 
1 


Figure 8.6 contains a graph of the density function for U. Again, we see that U satisfies 
the requirements of a pivotal quantity. Because we seek a 95% lower confidence limit 
for 0, let us determine the value for a so that P(U < a) = .95. That is, 


f (1) du = .95, 
0 
or а = .95. Thus, 
Y Y 
P(U < .95) = Р G < 95) = Р(Ү < .950) = Р (5 < 0) = .95. 


We see that Y/.95 is a lower confidence limit for Ө, with confidence coefficient 
.95. Because any observed Y must be less than 6, it is intuitively reasonable to have 
the lower confidence limit for 0 slightly larger than the observed value of У. 


8.39 


8.40 


8.41 


The two preceding examples illustrate the use of the pivotal method for finding 
confidence limits for unknown parameters. In each instance, the interval estimates 
were developed on the basis of a single observation from the distribution. These ex- 
amples were introduced primarily to illustrate the pivotal method. In the remaining 
sections of this chapter, we use this method in conjunction with the sampling distri- 
butions presented in Chapter 7 to develop some interval estimates of greater practical 
importance. 


Exercises 


Suppose that the random variable Y has a gamma distribution with parameters a = 2 and an 
unknown f. In Exercise 6.46, you used the method of moment-generating functions to prove a 
general result implying that 2Y /B has a x? distribution with 4 degrees of freedom (df). Using 
2Ү/ В as a pivotal quantity, derive a 90% confidence interval for В. 


Suppose that the random variable Y is an observation from a normal distribution with unknown 
mean џи and variance 1. Find a 


a 95% confidence interval for и. 
b 95% upper confidence limit for и. 


c 95% lower confidence limit for и. 


Suppose that Y is normally distributed with mean 0 and unknown variance o?. Then Y?/o? 
has a x? distribution with 1 df. Use the pivotal quantity Y?/o? to find a 
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a 95% confidence interval for o°. 
b 95% upper confidence limit for c?. 


c 95% lower confidence limit for o?. 
Use the answers from Exercise 8.41 to find a 


a 95% confidence interval for с. 
b 95% upper confidence limit for с. 


с 95% lower confidence limit for o. 


Let Y;, Y?, ..., Y, denote a random sample of size n from a population with a uniform distri- 
bution on the interval (0, Ө). Let Yi, = max(Y;, Y2, ..., У„) and U = (1/0)Y,,. 


a Show that U has distribution function 


0, u <0, 
Fy(u)= iw", О<и<1, 
1, uc 1. 


b Because the distribution of U does not depend on Ө, U is a pivotal quantity. Find a 95% 
lower confidence bound for Ө. 


Let Y have probability density function 


2(0 — 
fy» = AA poy se 
0, elsewhere. 
a Show that Y has distribution function 
0, у <0, 
Fy(y) = уу O<y<4@, 

Ө Ө? 
1; у> 0. 


b Show that Y/0 is a pivotal quantity. 
с Use the pivotal quantity from part (b) to find a 90% lower confidence limit for 0. 


Refer to Exercise 8.44. 


a Use the pivotal quantity from Exercise 8.44(b) to find a 90% upper confidence limit for 0. 
b 10, is the lower confidence bound for 0 obtained in Exercise 8.44(c) and бу is the upper 
bound found in part (a), what is the confidence coefficient of the interval (0;, б)? 


Refer to Example 8.4 and suppose that Y is a single observation from an exponential distribution 
with mean 0. 


a Use the method of moment-generating functions to show that 2Y/0 is a pivotal quantity 
and has a x? distribution with 2 df. 
Use the pivotal quantity 2Y /0 to derive a 90% confidence interval for Ө. 
Compare the interval you obtained in part (b) with the interval obtained in Example 8.4. 


Refer to Exercise 8.46. Assume that Y;, Yo, ..., Y, is a sample of size n from an exponential 
distribution with mean 0. 


a Use the method of moment-generating functions to show that 25 7; , Y;/0 is a pivotal 
quantity and has a x? distribution with 2л df. 


b Use the pivotal quantity 2 У , Y;/0 to derive a 95% confidence interval for Ө. 
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EXAMPLE 8.6 


Solution 
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с Ifa sample of size п = 7 yields у = 4.77, use the result from part (b) to give a 95% 
confidence interval for Ө. 


Refer to Exercises 8.39 and 8.47. Assume that У, Y2,..., Y, is a sample of size n from a 
gamma-distributed population with о = 2 and unknown f. 


a Use the method of moment-generating functions to show that 2 У" У, /В isa pivotal quantity 
and has a x? distribution with 4n df. 
b Use the pivotal quantity 2 > ^; Y;/f to derive a 9596 confidence interval for f. 


If a sample of size n — 5 yields y — 5.39, use the result from part (b) to give a 9596 
confidence interval for 8. 


Refer to Exercise 8.48. Suppose that Yı, Yo, ..., Y, is a sample of size n from a gamma- 
distributed population with parameters o and f. 


a Ifo — m, where m is a known integer and f is unknown, find a pivotal quantity that has a 
x? distribution with m x n df. Use this pivotal quantity to derive a 100(1 — @)% confidence 
interval for В. 

b Іо = с, where c is a known constant but not an integer and f is unknown, find a pivotal 
quantity that has a gamma distribution with parameters o^ = cn and 8* = 1. Give a formula 
for a 100(1 — w)% confidence interval for В. 

с Applet Exercise Refer to part (b). If = с = 2.57 and a sample of size n = 10 yields y = 
11.36, give a95% confidence interval for B. [Use the applet Gamma Probabilities and Quan- 
tiles to obtain appropriate quantiles for the pivotal quantity that you obtained in part (Б).] 


Large-Sample Confidence Intervals 


In Section 8.3, we presented some unbiased point estimators for the parameters џи, p, 
Hı = иә, and pi — p». As we indicated in that section, for large samples all these point 
estimators have approximately normal sampling distributions with standard errors as 
given in Table 8.1. That is, under the conditions of Section 8.3, if the target parameter 
Oisu, p, ш — H2, OF pı — po, then for large samples, 
6-6 
Z A 


0% 
possesses approximately a standard normal distribution. Consequently, Z = 
(6 —6)/ сд forms (at least approximately) a pivotal quantity, and the pivotal method 
can be employed to develop confidence intervals for the target parameter 0. 


Let Ê be a statistic that is normally distributed with mean 0 and standard error од. Find 
a confidence interval for 0 that possesses a confidence coefficient equal to (1 — o). 


The quantity 
0-0 


% 


Z= 


has a standard normal distribution. Now select two values in the tails of this distribu- 
tion, 20/2 and —Z,/2, such that (see Figure 8.7) 


P(—Zap2 < Z < 20р) = 1-а. 
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FIGURE 8.7 
Location of 4,2 
and – 2,2 


а /2 а /2 
44/2 0 2а/2 


Substituting for Z in the probability statement, we have 


6-6 
Р | -zaz < — < zap | = 1-0. 
о 


Multiplying by og, we obtain 
P(—z4505 €Ó — 0 < zapo) = 1— a 
and subtracting Ó from each term of the inequality, we get 
P(-6 — zapoa < —0 < —Ó + гарар) = 1— a. 


Finally, multiplying each term by —1 and, consequently, changing the direction of 
the inequalities, we have 


P(0 — 24/209 < 0 < 6+ 40/203) = 1 — а. 
Thus, the endpoints for a 100(1 — о)% confidence interval for 0 are given by 


б. = 6 — za 208 and бр = Ô + 24/209. | 


By analogous arguments, we can determine that 100(1 —о)% one-sided confidence 
limits, often called upper and lower bounds, respectively, are given by 


100(1 — о)% lower bound for 0 = Ê — 2,03. 
100(1 — w)% upper bound for Ө = 6 + 2,07. 


Suppose that we compute both a 100(1 — 0) % lower bound and a 100(1 — œ)% upper 
bound for 0. We then decide to use both of these bounds to form a confidence interval 
for 9. What will be the confidence coefficient of this interval? A quick look at the 
preceding confirms that combining lower and upper bounds, each with confidence 
coefficient 1 — o, yields a two-sided interval with confidence coefficient 1 — 2o. 

Under the conditions described in Section 8.3, the results given earlier in this 
section can be used to find large-sample confidence intervals (one-sided or two-sided) 
for u, р, (ші — мә), and (pı — p2). The following examples illustrate applications 
of the general method developed in Example 8.6. 


EXAMPLE 8.7 


The shopping times of n — 64 randomly selected customers at a local supermarket 
were recorded. The average and variance ofthe 64 shopping times were 33 minutes and 
256 minutes’, respectively. Estimate ju, the true average shopping time per customer, 
with a confidence coefficient of 1 — о = .90. 


Solution 
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In this case, we are interested in the parameter 0 = u. Thus, д = y = 33 and 52 = 256 
for a sample of п = 64 shopping times. The population variance o? is unknown, so 
(as in Section 8.3), we use s? as its estimated value. The confidence interval 


0 + 2а/200 


has the form 


с S 
у cb — |) yzcze — |. 
И (5) 4 (7) 


From Table 4, Appendix 3, 2,2 = 2.05 = 1.645; hence, the confidence limits are 
given by 


= 5 16 
У — 40/2 fi = 33 — 1.645 * — 29.71, 


16 
FH Zan (=) = 33 + 1.645 (s) — 36.29. 


Thus, our confidence interval for jz is (29.71, 36.29). In repeated sampling, approx- 
imately 90% of all intervals of the form Y + 1.645(S /,/n) include и, the true mean 
shopping time per customer. Although we do not know whether the particular interval 
(29.71, 36.29) contains и, the procedure that generated it yields intervals that do 
capture the true mean in approximately 95% of all instances where the procedure is 
used. m 


EXAMPLE 8.8 


Solution 


Two brands of refrigerators, denoted A and B, are each guaranteed for 1 year. In a 
random sample of 50 refrigerators of brand A, 12 were observed to fail before the 
guarantee period ended. An independent random sample of 60 brand B refrigerators 
also revealed 12 failures during the guarantee period. Estimate the true difference 
(pı = рә) between proportions of failures during the guarantee period, with confidence 
coefficient approximately .98. 


The confidence interval 


6 + £Za/208 


А ^ Pig P2492 
(Êi = P2) E za, | —— +H ——. 
nı n 


Because pi, 91, p», and 4 are unknown, the exact value of og cannot be evaluated. But 
as indicated in Section 8.3, we can get a good approximation for og by substituting 
ĝi, ĝi = 1 — pi, po, and G2 = 1 — po for pi, qi, p2, and qo, respectively. 

For this example, ђ = .24, 4 = .76, p; = .20, Go = .80, and zo = 2.33. The 
desired 98% confidence interval is 


(.24)(.76) | (.20)(.80) 
i 
os 50 T 60 
.04 + .1851 or [—.1451, .2251]. 


now has the form 


(.24 — 39) 4 
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Notice that this confidence interval contains zero. Thus, a zero value for the difference 
in proportions (p; — рә) is "believable" (at approximately the 98% confidence level) 
on the basis of the observed data. However, the interval also includes the value .1. 
Thus, .1 represents another value of (p; — p») that is "believable" on the basis of the 
data that we have analyzed. О 


FIGURE 8.8 
Twenty-four realized 
95% confidence 
intervals for a 
population 
proportion 


We close this section with an empirical investigation of the performance of the 
large-sample interval estimation procedure for a single population proportion p, based 
on Y, the number of successes observed during лп trials in a binomial experiment. In 
this case, Ө = p; Ó = p = Y/n and 05 = op = Jp = р)/п ~ VPC — p)/n. 
(As in Section 8.3, //p(1 — p)/n provides a good approximation for ор.) The appro- 
priate confidence limits then are 


5 pil — р А р(1— р 
OL = р — 20/2 W] and Oy = Р + 20/2 | 


Figure 8.8 shows the results of 24 independent binomial experiments, each based on 
35 trials when the true value of p = 0.5. For each of the experiments, we calculated the 
number of successes y, the value of p = y/35, and the corresponding 95% confidence 
interval, using the formula f + 1.964/P(1 — p)/35. (Notice that 2.025 = 1.96.) In 
the first binomial experiment, we observed y = 18, p = 18/35 = 0.5143, and 
op J pd — p)/n = /(.5143)(.4857)/35 = 0.0845. So, the interval obtained in 
the first experiment is .5143 + 1.96(0.0845) or (0.3487, 0.6799). The estimate for 
p from the first experiment is shown by the lowest large dot in Figure 8.8, and the 
resulting confidence interval is given by the horizontal line through that dot. The 
vertical line indicates the true value of p, 0.5 in this case. Notice that the interval 


True Probability 
0.50 


0.25 0.50 


Estimated Probability 


8.50 
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obtained in the first trial (of size 35) actually contains the true value of the population 
proportion p. 

The remaining 23 confidence intervals contained in this small simulation are given 
by the rest of the horizontal lines in Figure 8.8. Notice that each individual interval 
either contains the the true value of p or it does not. However, the true value of p is 
contained in 23 out of the 24 (95.8%) of intervals observed. 

If the same procedure was used many times, each individual interval would either 
contain or fail to contain the true value of p, but the percentage of all intervals that 
capture p would be very close to 95%. You are “95% confident” that the interval 
contains the parameter because the interval was obtained by using a procedure that 
generates intervals that do contain the parameter approximately 95% of the times the 
procedure is used. 

The applet ConfidenceIntervalP (accessible at www.thomsonedu.com/statistics/ 
wackerly) was used to produce Figure 8.8. What happens if different values of n or 
different confidence coefficients are used? Do we obtain similar results if the true 
value of p is something other than 0.5? Several of the following exercises will allow 
you to use the applet to answer questions like these. 

In this section, we have used the pivotal method to derive large-sample confidence 
intervals for the parameters и, p, ш — H2, and pı — р» under the conditions of 
Section 8.3. The key formula is 


6+ 20/203, 


where the values of Ô and og are as given in Table 8.1. When Ө = и is the target 
parameter, then Ê = Y and o; = o? [n, where c? is the population variance. If the 
true value of o? is known, this value should be used in calculating the confidence 
interval. If o? is not known and n is large, there is no serious loss of accuracy if s? 
is substituted for o? in the formula for the confidence interval. Similarly, if oi and 
02 are unknown and both n; and пә are large, s? and 52 can be substituted for these 
values in the formula for а large-sample confidence interval for 0 = ш — u2. 
When 0 = p is the target parameter, then Ó = р and op = ./pq/n. Because p is 
the unknown target parameter, o; cannot be evaluated. If n is large and we substitute 
p for p (and = 1— р for q) in the formula for o5, however, the resulting confidence 
interval will have approximately the stated confidence coefficient. For large n; and n2, 
similar statements hold when f, and f» are used to estimate p; and p», respectively, 
in the formula foro? .. The theoretical justification for these substitutions will be 


bia 
provided in Section 9.3. 


Exercises 


Refer to Example 8.8. In this example, p, and p; were used to denote the proportions of 
refrigerators of brands A and B, respectively, that failed during the guarantee periods. 


a At the approximate 98% confidence level, what is the largest “believable value" for the 
difference in the proportions of failures for refrigerators of brands A and B? 


b At the approximate 98% confidence level, what is the smallest “believable value" for the 
difference in the proportions of failures for refrigerators of brands A and B? 
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8.51 


8.52 


8.53 


с Ер — p; actually equals 0.2251, which brand has the larger proportion of failures during 
the warranty period? How much larger? 

d If p; — рг actually equals —0.1451, which brand has the larger proportion of failures during 
the warranty period larger? How much larger? 

e As observed in Example 8.8, zero is a believable value of the difference. Would you 
conclude that there is evidence of a difference in the proportions of failures (within the 
warranty period) for the two brands of refrigerators? Why? 


Applet Exercise What happens if we attempt to use the applet ConfidencelIntervalP (ac- 
cessible at www.thomsonedu.com/statistics/wackerly) to reproduce the results presented in 
Figure 8.8? Access the applet. Don't change the value of p from .50 or the confidence coeffi- 
cient from .95, but use the "Sample Size" button to change the sample size to n — 35. Click 
the button “Опе Sample" a single time. In the top left portion of the display, the sample values 
are depicted by a set of 35 Os and 1s, and the value of the estimate for p and the resulting 95% 
confidence interval are given below the sample values. 


a What is the value of р that you obtained? Is it the same as the first value obtained, 0.5143, 
when Figure 8.8 was generated? Does this surprise you? Why? 

b Use the value of the estimate that you obtained and the formula for a 9596 confidence 
interval to verify that the confidence interval given on the display is correctly calculated. 

с Does the interval that you obtained contain the true value of p? 

d What is the length of the confidence interval that you obtained? Is it exactly the same 
as the length of first interval, (.3487, .6799), obtained when Figure 8.8 was generated? 
Why? 

e Click the button “Опе Sample" again. Is this interval different than the one previously 
generated? Click the button “Опе Sample" three more times. How many distinctly dif- 
ferent intervals appear among the first 5 intervals generated? How many of the intervals 
contain .5? 


f Click the button “One Sample" until you have obtained 24 intervals. What percentage of 
the intervals contain the true value of p — .5? Is the percentage close to the value that you 
expected? 


Applet Exercise Referto Exercise 8.51. Don'tchange the value of p from .50 orthe confidence 
coefficient from .95, but use the button “Sample Size" to change the sample size to n = 50. 
Click the button “Опе Sample" a single time. 


a How long is the resulting confidence interval? How does the length of this interval compare 
to the one that you obtained in Exercise 8.51(d)? Why are the lengths of the intervals 
different? 


b Click the button “25 Samples.” Is the percentage of intervals that contain the true value of 
p close to what you expected? 

c Click the button “100 Samples.” Is the percentage of intervals that contain the true value 
of p close to what you expected? 

d If you were to click the button “100 Samples" several times and calculate the percentage 
of all of the intervals that contain the true value of p, what percentage of intervals do you 
expect to capture p? 


Applet Exercise Referto Exercises 8.51 and 8.52. Change the value of p to .25 (put the cursor 
on the vertical line and drag it to the left until 0.25 appears as the true probability). Change the 
sample size to n — 75 and the confidence coefficient to .90. 


8.54 


8.55 


8.56 


8.57 


8.58 
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a Click the button “One Sample” a single time. 


і What is the length of the resulting interval? Is the interval longer or shorter than that 
obtained in Exercise 8.51(d)? 

ii Give three reasons that the interval you obtained in part (i) is shorter than the interval 
obtained in Exercise 8.51(d). 


b Click the button “100 Samples" a few times. Each click will produce 100 intervals and 
provide you with the number and proportion of those 100 intervals that contain the true 
value of p. After each click, write down the number of intervals that captured p — .25. 


i How many intervals did you generate? How many of the generated intervals captured 
the true value of p? 


ii What percentage of all the generated intervals captured p? 


Applet Exercise Refer to Exercises 8.51—8.53. Change the value of р to .90. Change the 
sample size to n — 10 and the confidence coefficient to 0.95. Click the button *100 Samples" 
a few times. After each click, write down the number of intervals that captured p — .90. 


a Whenthe simulation produced ten successes in ten trials, what is the resulting realized 95% 
confidence interval for p? What is the length of the interval? Why? How is this depicted 
on the display? 

b How many intervals did you generate? How many of the generated intervals captured the 
true value of p? 


c What percentage of all of the generated intervals captured p? 
Does the result of part (c) surprise you? 


Does the result in part (c) invalidate the large-sample confidence interval procedures pre- 
sented in this section? Why? 


Applet Exercise Refer to Exercises 8.51—8.54. Change the value of p to .90. Change the 
sample size to n = 100 and the confidence coefficient to .95. Click the button “100 Samples" 
a few times. After each click, write down the number of intervals that captured p — .90 and 
answer the questions posed in Exercise 8.54, parts (b)-(e). 


Is America’s romance with movies on the wane? In a Gallup PolP ofn = 800 randomly chosen 
adults, 45% indicated that movies were getting better whereas 43% indicated that movies were 
getting worse. 


a Finda 9846 confidence interval for p, the overall proportion of adults who say that movies 
are getting better. 

b Does the interval include the value p — .50? Do you think that a majority of adults say 
that movies are getting better? 


Refer to Exercise 8.29. According to the result given there, 5196 of the n — 1001 adults polled 
in November 2003 claimed to be baseball fans. Construct a 99% confidence interval for the 
proportion of adults who professed to be baseball fans in November 2003 (after the World 
Series). Interpret this interval. 


The administrators for a hospital wished to estimate the average number of days required for 
inpatient treatment of patients between the ages of 25 and 34. A random sample of 500 hospital 


5. Source: "Movie Mania Ebbing,” Gallup Poll of 800 adults, http://www.usatoday.com/snapshot/news/ 
2001-06-14-moviemania.htm., 16-18 March 2001. 
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8.59 


8.60 


8.61 


8.62 


patients between these ages produced a mean and standard deviation equal to 5.4 and 3.1 days, 
respectively. Construct a 95% confidence interval for the mean length of stay for the population 
of patients from which the sample was drawn. 


eer 


When it comes to advertising, “tweens” are not ready for the hard-line messages that advertisers 
often use to reach teenagers. The Geppeto Group study? found that 78% of ’tweens understand 
and enjoy ads that are silly in nature. Suppose that the study involved n — 1030 'tweens. 


a Construct a 9096 confidence interval for the proportion of 'tweens who understand and 
enjoy ads that are silly in nature. 


b Do you think that “more than 75%” of all 'tweens enjoy ads that are silly in nature? Why? 


What is the normal body temperature for healthy humans? A random sample of 130 healthy 
human body temperatures provided by Allen Shoemaker’ yielded 98.25 degrees and standard 
deviation 0.73 degrees. 


a Give a 99% confidence interval for the average body temperature of healthy people. 


b Does the confidence interval obtained in part (a) contain the value 98.6 degrees, the accepted 
average temperature cited by physicians and others? What conclusions can you draw? 


A small amount of the trace element selenium, from 50 to 200 micrograms (ше) per day, is 
considered essential to good health. Suppose that independent random samples of n; = n; = 30 
adults were selected from two regions of the United States, and a day's intake of selenium, from 
both liquids and solids, was recorded for each person. The mean and standard deviation of the 
selenium daily intakes for the 30 adults from region 1 were y, = 167.1 wg and s, = 24.3 ир, 
respectively. The corresponding statistics for the 30 adults from region 2 were y; = 140.9 ug 
and s; = 17.6 ug. Find a 95% confidence interval for the difference in the mean selenium 
intake for the two regions. 


The following statistics are the result of an experiment conducted by P. I. Ward to investigate 
a theory concerning the molting behavior of the male Gammarus pulex, a small crustacean.? 
If a male needs to molt while paired with a female, he must release her, and so loses her. The 
theory is that the male G. pulex is able to postpone molting, thereby reducing the possibility 
of losing his mate. Ward randomly assigned 100 pairs of males and females to two groups of 
50 each. Pairs in the first group were maintained together (normal); those in the second group 
were separated (split). The length of time to molt was recorded for both males and females, 
and the means, standard deviations, and sample sizes are shown in the accompanying table. 
(The number of crustaceans in each of the four samples is less than 50 because some in each 
group did not survive until molting time.) 


Time to Molt (days) 
Mean 5 п 
Males 
Normal 24.8 TA 34 
Split 21.3 8.1 41 
Females 
Normal 8.6 4.8 45 
Split 11.6 5.6 48 


6. Source: “Caught in the Middle,” American Demographics, July 2001, pp. 14—15. 


7. Source: Allen L. Shoemaker, “What’s Normal? Temperature, Gender and Heart Rate,’ Journal of 
Statistics Education (1996). 


8. Source: “Gammarus pulex Control Their Moult Timing to Secure Mates,” Animal Behaviour 32 (1984). 
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a Find a 99% confidence interval for the difference in mean molt time for “normal” males 
versus those “split” from their mates. 


b Interpret the interval. 


Most Americans love participating in or at least watching sporting events. Some feel that 
sports have more than just entertainment value. In a survey of 1000 adults, conducted by KRC 
Research & Consulting , 78% felt that spectator sports have a positive effect on society. 


a Find a 95% confidence interval for the percentage of the public that feel that sports have a 
positive effect on society. 

b The poll reported a margin of error of “plus or minus 3.1%.” Does this agree with your 
answer to part (a)? What value of p produces the margin of error given by the poll? 


Ina CNN/USA Today/Gallup Poll, 1000 Americans were asked how well the term patriotic de- 
scribed themselves.!° Some results from the poll are contained in the following summary table. 


Age Group 
Al 18-34 60+ 
Very well 53 35 77 
Somewhat well .31 41 .17 


Not Very well .10 .16 .04 
Not well at all .06 .08 .02 


a If the 18—34 and 60+ age groups consisted of 340 and 150 individuals, respectively, find a 
98% confidence interval for the difference in proportions of those in these age groups who 
agreed that patriotic described them very well. 

b Based on the interval that you obtained in part (a), do you think that the difference in 
proportions of those who view themselves as patriotic is as large as 0.6? Explain. 


For a comparison of the rates of defectives produced by two assembly lines, independent ran- 
dom samples of 100 items were selected from each line. Line A yielded 18 defectives in the 
sample, and line B yielded 12 defectives. 


a Find a 9896 confidence interval for the true difference in proportions of defectives for the 
two lines. 

b Is there evidence here to suggest that one line produces a higher proportion of defectives 
than the other? 


Historically, biology has been taught through lectures, and assessment of learning was ac- 
complished by testing vocabulary and memorized facts. A teacher-devoloped new curriculum, 
Biology: A Community Content (BACC), is standards based, activity oriented, and inquiry 
centered. Students taught using the historical and new methods were tested in the traditional 
sense on biology concepts that featured biological knowledge and process skills. The results 
of a test on biology concepts were published in The American Biology Teacher and are given 
in the following table.!! 


9. Source: Mike Tharp, “Ready, Set, Go. Why We Love Our Games— Sports Crazy,” U.S. News & World 
Report, 15 July 1997, p. 31. 

10. Source: Adapted from “I’m a Yankee Doodle Dandy,” Knowledge Networks: 2000, American Demo- 
graphics, July 2001, p. 9. 

11. Source: William Leonard, Barbara Speziale, and John Pernick, "Performance Assessment of a Stand- 
ards-Based High School Biology Curriculum,” The American Biology Teacher 63(5) (2001): 310-316. 
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Sample Standard 


Mean Size Deviation 
Pretest: all BACC classes 13.38 372 5.59 
Pretest: all traditional 14.06 368 5.45 
Posttest: all BACC classes 18.50 365 8.03 
Posttest: all traditional 16.50 298 6.96 


a Give a 90% confidence interval for the mean posttest score for all BACC students. 


b Find a 9596 confidence interval for the difference in the mean posttest scores for BACC 
and traditionally taught students. 


c Does the confidence interval in part (b) provide evidence that there is a difference in the 
mean posttest scores for BACC and traditionally taught students? Explain. 


One suggested method for solving the electric-power shortage in a region involves constructing 
floating nuclear power plants a few miles offshore in the ocean. Concern about the possibility 
of a ship collision with the floating (but anchored) plant has raised the need for an estimate 
of the density of ship traffic in the area. The number of ships passing within 10 miles of the 
proposed power-plant location per day, recorded for n — 60 days during July and August, 
possessed a sample mean and variance of y = 7.2 and 52 = 8.8. 


a Find a 9596 confidence interval for the mean number of ships passing within 10 miles of 
the proposed power-plant location during a 1-day time period. 

b The density of ship traffic was expected to decrease during the winter months. A sample 
of n = 90 daily recordings of ship sightings for December, January, and February yielded 
a mean and variance of y — 4.7 and s? — 4.9. Find a 9096 confidence interval for the 
difference in mean density of ship traffic between the summer and winter months. 


c Whatis the population associated with your estimate in part (b)? What could be wrong 
with the sampling procedure for parts (a) and (b)? 


Suppose that Y;, Y2, Уз, and Y, have a multinomial distribution with n trials and probabilities 
Di. P2, рз, and ра for the four cells. Just as in the binomial case, any linear combination of 
Yi, Y?, Үз, and Y, will be approximately normally distributed for large n. 


a Determine the variance of Y; — Y2. [Hint: Recall that the random variables Y; are dependent.] 


b Astudy of attitudes among residents of Florida with regard to policies for handling nuisance 
alligators in urban areas showed the following. Among 500 people sampled and presented 
with four management choices, 6% said the alligators should be completely protected, 1696 
said they should be destroyed by wildlife officers, 52% said they should be relocated live, 
and 26% said that a regulated commercial harvest should be allowed. Estimate the differ- 
ence between the population proportion favoring complete protection and the population 
proportion favoring destruction by wildlife officers. Use a confidence coefficient of .95. 


The Journal of Communication, Winter 1978, reported on a study of viewing violence on TV. 
Samples from populations with low viewing rates (10—19 programs per week) and high view- 
ing rates (40-49 programs per week) were divided into two age groups, and Y, the number 
of persons watching a high number of violent programs, was recorded. The data for two age 
groups are shown in the accompanying table, with п; denoting the sample size for each cell. If 
Yi, Y2, Үз, and Y, have independent binomial distributions with parameters ру, p2, рз, and 
ра, respectively, find a 9596 confidence interval for (рз — р) — (p4 — p2). This function of the 
р: values represents a comparison between the change in viewing habits for young adults and 
the corresponding change for older adults, as we move from those with low viewing rates to 
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those with high viewing rates. (The data suggest that the rate of viewing violence may increase 
with young adults but decrease with older adults.) 


Age Group 
Viewing Rate 16-34 55 and Over 
Low y =20 пү=31 у= 13 ny = 30 
High y3=18 ng=26 у=7 n4 = 28 


Selecting the Sample Size 


The design of an experiment is essentially a plan for purchasing a quantity of infor- 
mation. Like any other commodity, information may be acquired at varying prices 
depending on the manner in which the data are obtained. Some measurements contain 
a large amount of information about the parameter of interest; others may contain lit- 
tle or none. Research, scientific or otherwise, is done in order to obtain information. 
Obviously, we should seek to obtain information at minimum cost. 

The sampling procedure—or experimental design, as it is usually called—affects 
the quantity of information per measurement. This, together with the sample size n 
controls the total amount of relevant information in a sample. At this point in our 
study, we will be concerned with the simplest sampling situation: random sampling 
from a relatively large population. We first devote our attention to selection of the 
sample size п. 

A researcher makes little progress in planning an experiment before encountering 
the problem of selecting the sample size. Indeed, one of the most frequent questions 
asked of the statistician is, How many measurements should be included in the sample? 
Unfortunately, the statistician cannot answer this question without knowing how much 
information the experimenter wishes to obtain. Referring specifically to estimation, 
we would like to know how accurate the experimenter wishes the estimate to be. The 
experimenter can indicate the desired accuracy by specifying a bound on the error of 
estimation. 

For instance, suppose that we wish to estimate the average daily yield и of a 
chemical and we wish the error of estimation to be less than 5 tons with probability 
.95. Because approximately 95% of the sample means will lie within 20у of ш in 
repeated sampling, we are asking that 207 equal 5 tons (see Figure 8.9). Then 


20 40? 

—==5 and п=——. 

n 25 
We cannot obtain an exact numerical value for n unless the population standard 
deviation o is known. This is exactly what we would expect because the variability 
associated with the estimator Y depends on the variability exhibited in the population 
from which the sample will be drawn. 

Lacking an exact value for c, we use the best approximation available such as 
an estimate s obtained from a previous sample or knowledge of the range of the 
measurements in the population. Because the range is approximately equal to 4c 
(recall the empirical rule), one-fourth of the range provides an approximate value 
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FIGURE 8.9 

The approximate 
distribution of Y for 
large samples 


= 
<I 


of o. For our example, suppose that the range of the daily yields is known to be 
approximately 84 tons. Then о ~ 84/4 = 21 and 

Е 4o? " (4)(21)? 
725 35 

= 71. 


= 70.56 


Using a sample sizen = 71, we сап be reasonably certain (with confidence coefficient 
approximately equal to .95) that our estimate will lie within 5 tons of the true average 
daily yield. 

Actually, we would expect the error of estimation to be much less than 5 tons. 
According to the empirical rule, the probability is approximately equal to .68 that the 
error of estimation will be less than оу = 2.5 tons. The probabilities .95 and .68 used 
in these statements are inexact because o was approximated. Although this method of 
choosing the sample size is only approximate for a specified accuracy of estimation, 
itis the best available and is certainly better than selecting the sample size intuitively. 

The method of choosing the sample sizes for all the large-sample estimation pro- 
cedures outlined in Table 8.1 is analogous to that just described. The experimenter 
must specify a desired bound on the error of estimation and an associated confidence 
level 1 — о. For example, if the parameter is 0 and the desired bound is B, we equate 


Za/209 = В, 
where, as in Section 8.6, 
a 
P(Z > Za/2) = 3 


We illustrate the use of this method in the following examples. 


EXAMPLE 8.9 


Solution 


The reaction of an individual to a stimulus in a psychological experiment may take 
one of two forms, A or B. If an experimenter wishes to estimate the probability p that a 
person will react in manner A, how many people must be included in the experiment? 
Assume that the experimenter will be satisfied if the error of estimation is less than 
.04 with probability equal to .90. Assume also that he expects p to lie somewhere in 
the neighborhood of .6. 


Because we have specified that 1 — о = .90, о must equal .10 and 2/2 = .05. The z 
value corresponding to an area equal to .05 in the upper tail of the standard normal 
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distribution is 206/2 = 2.05 = 1.645. We then require that 


1.64505 =.04, or 1.645,/2% = 04. 
n 


Because the standard error of P depends on p, which is unknown, we could use the 
guessed value of p = .6 provided by the experimenter as an approximate value for 
n. Then 


n = 406. 


In this example, we assumed that p ~ .60. How would we proceed if we had no idea 
about the true value of p? In Exercise 7.76(a), we established that the maximum value 
for the variance of = Y /n occurs when р = .5. If we did not know that p 7 .6, we 
would use p — .5, which would yield the maximum possible value for n: n — 423. 
No matter what the true value for p, n — 423 is large enough to provide an estimate 
that is within B = .04 of p with probability .90. L1 


EXAMPLE 8.10 


Solution 


An experimenter wishes to compare the effectiveness of two methods of training in- 
dustrial employees to perform an assembly operation. The selected employees are to 
be divided into two groups of equal size, the first receiving training method 1 and the 
second receiving training method 2. After training, each employee will perform the 
assembly operation, and the length of assembly time will be recorded. The experi- 
menter expects the measurements for both groups to have a range of approximately 
8 minutes. If the estimate of the difference in mean assembly times is to be correct 
to within 1 minute with probability .95, how many workers must be included in each 
training group? 


The manufacturer specified 1 — о = .95. Thus, о = .05 and 240 = 2025 = 1.96. 
Equating 1.960(y, y, to | minute, we obtain 


2 2 

] 91. 25 
1.96,| — + == 1. 

ПІ n» 


Alternatively, because we desire n, to equal пз, we may let n, = n; = n and obtain 


the equation 
2 2 
1.96/24 22 — 1 
n n 


As noted earlier, the variability of each method of assembly is approximately the 
same; hence, o? = 0 = o°. Because the range, 8 minutes, is approximately equal 
to 4a, we have 


До &:8, orequivalently, o ғ 2. 
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Substituting this value for c, and o» in the earlier equation, we obtain 


2 2 
D ж ek 
п п 


Solving, we obtainn = 30.73. Therefore, each group should containn = 31 members. 


8.70 


8.71 


8.72 


8.73 


8.74 


8.75 


Exercises 


Let Y be a binomial random variable with parameter p. Find the sample size necessary to 
estimate p to within .05 with probability .95 in the following situations: 


a If p is thought to be approximately .9 


b If no information about p is known (use р = .5 in estimating the variance of f). 


A state wildlife service wants to estimate the mean number of days that each licensed hunter 
actually hunts during a given season, with a bound on the error of estimation equal to 2 hunting 
days. If data collected in earlier surveys have shown o to be approximately equal to 10, how 
many hunters must be included in the survey? 


Telephone pollsters often interview between 1000 and 1500 individuals regarding their opinions 
on various issues. Does the performance of colleges’ athletic teams have a positive impact on 
the public’s perception of the prestige of the institutions? A new survey is to be undertaken to 
see if there is a difference between the opinions of men and women on this issue. 


a If 1000 men and 1000 women are to be interviewed, how accurately could you estimate the 
difference in the proportions who think that the performance of their athletics teams has a 
positive impact on the perceived prestige of the institutions? Find a bound on the error of 
estimation. 


b Suppose that you were designing the survey and wished to estimate the difference in a pair 
of proportions, correct to within .02, with probability .9. How many interviewees should 
be included in each sample? 


Refer to Exercise 8.59. How many 'tweens should have been interviewed in order to estimate 
the proportion of 'tweens who understand and enjoy ads that are silly in nature, correct to within 
.02, with probability .99? Use the proportion from the previous sample in approximating the 
standard error of the estimate. 


Suppose that you want to estimate the mean pH of rainfalls in an area that suffers from 
heavy pollution due to the discharge of smoke from a power plant. Assume that o is in the 
neighborhood of .5 pH and that you want your estimate to lie within .1 of u with probability 
near .95. Approximately how many rainfalls must be included in your sample (one pH reading 
per rainfall)? Would it be valid to select all of your water specimens from a single rainfall? 
Explain. 


Refer to Exercise 8.74. Suppose that you wish to estimate the difference between the mean 
acidity for rainfalls at two different locations, one in a relatively unpolluted area along the 
ocean and the other in an area subject to heavy air pollution. If you wish your estimate to 
be correct to the nearest .1 pH with probability near .90, approximately how many rainfalls 
(pH values) must you include in each sample? (Assume that the variance of the pH measure- 
ments is approximately .25 at both locations and that the samples are to be of equal size.) 
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Refer to the comparison of the daily adult intake of selenium in two different regions of the 
United States, in Exercise 8.61. Suppose that you wish to estimate the difference in the mean 
daily intake between the two regions, correct to within 5 ug, with probability .90. If you plan 
to select an equal number of adults from the two regions (that is, if ш = u2), how large should 
n; and n» be? 


Refer to Exercise 8.28. If the researcher wants to estimate the difference in proportions to 
within .05 with 90% confidence, how many graduates and nongraduates must be interviewed? 
(Assume that an equal number will be interviewed from each group.) 


Refer to Exercise 8.65. How many items should be sampled from each line if a 9596 confidence 
interval for the true difference in proportions is to have width .2? Assume that samples of equal 
size will be taken from each line. 


Refer to Exercise 8.66. 


a Another similar study is to be undertaken to compare the mean posttest scores for BACC 
and traditionally taught high school biology students. The objective is to produce a 99% 
confidence interval for the true difference in the mean posttest scores. If we need to sample 
an equal number of BACC and traditionally taught students and want the width of the 
confidence interval to be 1.0, how many observations should be included in each group? 
Repeat the calculations from part (a) if we are interested in comparing mean pretest scores. 
Suppose that the researcher wants to construct 9946 confidence intervals to compare both 
pretest and posttest scores for BACC and traditionally taught biology students. If her 
objective is that both intervals have widths no larger than 1 unit, what sample sizes should 
be used? 


Small-Sample Confidence Intervals 
for u апа шу — и? 


The confidence intervals for a population mean jz that we discuss in this section are 
based on the assumption that the experimenter's sample has been randomly selected 
from a normal population. The intervals are appropriate for samples of any size, 
and the confidence coefficients of the intervals are close to the specified values even 
when the population is not normal, as long as the departure from normality is not 
excessive. We rarely know the form of the population frequency distribution before 
we sample. Consequently, if an interval estimator is to be of any value, it must work 
reasonably well even when the population is not normal. “Working well" means that 
the confidence coefficient should not be affected by modest departures from normality. 
For most mound-shaped population distributions, experimental studies indicate that 
these confidence intervals maintain confidence coefficients close to the nominal values 
used in their calculation. 

We assume that Y1, Y2,..., Y, represent a random sample selected from a normal 
population, and we let Y and 52 represent the sample mean and sample variance, 
respectively. We would like to construct a confidence interval for the population 
mean when V (Y;) — o? is unknown and the sample size is too small to permit us to 
to apply the large-sample techniques of the previous section. Under the assumptions 
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FIGURE 8.10 
Location of t, 
and — @/2 


а /2 о /2 
—ta/2 0 ta/2 


just stated, Theorems 7.1 and 7.3 and Definition 7.2 imply that 
y= 
т= = 
S/J/n 
has a t distribution with (n — 1) df. The quantity Т serves as the pivotal quantity that 


we will use to form a confidence interval for џи. From Table 5, Appendix 3, we can 
find values fy/2 and — і, 2 (see Figure 8.10) so that 


P(—tjj5 € T € tgp) = 1-а. 


The f distribution has a density function very much like the standard normal density 
except that the tails are thicker (as illustrated in Figure 7.3). Recall that the values of 
1/2 depend on the degrees of freedom (n — 1) as well as on the confidence coefficient 
(1 — a). 

The confidence interval for u is developed by manipulating the inequalities in the 
probability statement in a manner analogous to that used in the derivation presented 
in Example 8.6. In this case, the resulting confidence interval for u is 


Ү+ ta /2 (=) i 


Under the preceding assumptions, we can also obtain 100(1 — œ)% one-sided 
confidence limits for u. Notice that tą, given in Table 5, Appendix 3, is such that 


P(T <t,)=1-a. 


Substituting Т into this expression and manipulating the resulting inequality, we 
obtain 


PLY — ta(S/Vn) < = 1-а. 
Thus, Y — t, (S/A/n) is a 100(1 — œ)% lower confidence bound for ш. Analogously, 
Y +ty(S/./n) isa 100(1 – а)% upper confidence bound for u. As in the large-sample 
case, if we determine both 100(1 — a)% lower and upper confidence bounds for ш 


and use the respective bounds as endpoints for a confidence interval, the resulting 
two-sided interval has confidence coefficient equal to 1 — 2o. 


EXAMPLE 8.11 


A manufacturer of gunpowder has developed a new powder, which was tested in eight 
shells. The resulting muzzle velocities, in feet per second, were as follows: 

3005 2925 2935 2965 

2995 3005 2937 2905 


Solution 
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Find a 95% confidence interval for the true average velocity и for shells of this type. 
Assume that muzzle velocities are approximately normally distributed. 


If we assume that the velocities Y; are normally distributed, the confidence interval 


for u is 
Ү + to 2 
/ Jn | 


where f5/? is determined for n— 1 df. For the given data, у = 2959 and s = 39.1. In this 
example, we have n — 1 = 7 df and, using Table 5, Appendix 3, fg/2 = 1025 = 2.365. 
Thus, we obtain 


1 
2959 + 2.365 (=). or 2959 + 32.7, 


V8 


as the observed confidence interval for ju. El 


Suppose that we are interested in comparing the means of two normal populations, 
one with mean jz; and variance o? and the other with mean u2 and variance 83. If the 
samples are independent, confidence intervals for шу — u2 based on a t-distributed 
random variable can be constructed if we assume that the two populations have a 
common but unknown variance, оў = б; = о? (unknown). 

If Y, and Y, are the respective sample means obtained from independent random 
samples from normal populations, the large-sample confidence interval for (ш — u2) 
is developed by using 


_ Yı — Y2) – (ш — m) 


Z. 
оү 03 
PE + ==. 
n, n2 


as a pivotal quantity. Because we assumed that the sampled populations are both 
normally distributed, Z has a standard normal distribution, and using the assumption 


o? = 02 = о?, the quantity Z may be rewritten as 


та Yi — Уз) = (ш - 92 


Because o is unknown, we need to find an estimator of ће common variance o? so 


that we can construct a quantity with a t distribution. 

Let Уу, Yio, ..., Yin, denote the random sample of size nı from the first pop- 
ulation and let Yo;, Y», ..., Y2n, denote an independent random sample of size n2 
from the second population. Then 
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The usual unbiased estimator of the common variance o? is obtained by pooling the 
sample data to obtain the pooled estimator 52: 


Mu — Yi)? + 02100 – Yo)? (ту DS + m- DS} 


52 = 
р = 
ny +nz—2 ny +n – 2 


, 


where 87 is the sample variance from the ith sample, i = 1, 2. Notice that if n; = n», 
52 is simply the average of 52 апа 82. If n| Z no, s? is the weighted average of S? 
and S, with larger weight given to the sample variance associated with the larger 
sample size. Further, 


w= (nı +n — 2)52 _ ye = Yi» d: Yeh: = Yo? 


о? о? о? 


is the sum of two independent x 2_distributed random variables with (n; — 1) and 
(по — 1) df, respectively. Thus, W has a x? distribution with v = (n; —1) + (nm2— 1) = 
(п 4- пә — 2) df. (See Theorems 7.2 and 7.3.) We now use the x ?-distributed variable 
W and the independent standard normal quantity Z defined in the previous paragraph 
to form a pivotal quantity: 


Z (Y; — Y. 


2) — (ш — ш) (m + п; —2)$% 

w [1 1 o?(n, + n; — 2) 
v с + 
ny no 


| Qi -Y)-i- m) 


a quantity that by construction has a ¢ distribution with (nı + n5 — 2) df. 
Proceeding as we did earlier in this section, we see that the confidence interval for 


(ші — u2) has the form 
E 1 1 
(Yi = Уз) + fojaSp |] —+ —, 
nj Ng 


where f,/2 is determined from the ¢ distribution with (n; + n2 — 2) df. 


EXAMPLE 8.12 


To reach maximum efficiency in performing an assembly operation in a manufac- 
turing plant, new employees require approximately a 1-топ training period. A 
new method of training was suggested, and a test was conducted to compare the 
new method with the standard procedure. Two groups of nine new employees each 
were trained for a period of 3 weeks, one group using the new method and the 
other following the standard training procedure. The length of time (in minutes) 
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Table 8.3 Data for Example 8.12 


Procedure Measurements 


Standard 32 37 35 28 41 44 35 31 34 
New 35 31 29 25 34 40 27 32 3 


required for each employee to assemble the device was recorded at the end of the 
3-week period. The resulting measurements are as shown in Table 8.3. Estimate 
the true mean difference (шу — u2) with confidence coefficient .95. Assume that the 
assembly times are approximately normally distributed, that the variances of the as- 
sembly times are approximately equal for the two methods, and that the samples are 
independent. 


Solution For the data in Table 8.3, with sample 1 denoting the standard procedure, we have 
y, = 35.22, Уз = 31.56, 


9 9 
> (yu 71)? = 195.56, У Ozu — y? = 160.22, 


i=l i-l 


52 = 24.445, 52 = 20.027. 


Непсе, 


2 8(24.445) + 8(20.027) 195.56 + 160.22 
S = = 
p 9+9—2 16 


= 22.236 and 5s, = 4.716. 


Notice that, because n, = n» = 9, s? is the simple average of s and 82. Also, 


1025 = 2.120 for (n; + n2 — 2) = 16 df. The observed confidence interval is 
therefore 


_ _ 1 
Оп — Y2) + ta/2Sp + 
ny 


1 1 
(35.22 — 31.56)  Q.1204.716)/ 5 + 5 


3.66 + 4.71. 


This confidence interval can be written in the form [—1.05, 8.37]. The interval is 
fairly wide and includes both positive and negative values. If шу — u2 is positive, 
Hı > иә and the standard procedure has a larger expected assembly time than the 
new procedure. If шу — u2 is really negative, the reverse is true. Because the interval 
contains both positive and negative values, neither training method can be said to 
produce a mean assembly time that differs from the other. m 


430 Chapter 8 Estimation 


8.80 


8.81 


Summary of Small-Sample Confidence Intervals for Means of Normal 
Distributions with Unknown Variance(s) 


Parameter Confidence Interval (v — df) 
YEr 2 v 1 
a = |, = ub — Ile 
Ш /2 E 


ea Т 1 1 
Ш — Шо Oi = ОУ изре ср ==, 
nı n» 


where v = n, + m — 2 and 


_ (щи —1)$ + 02 - DS 
Ж ny +n2—2 


2 
Sp 


(requires that the samples are independent and 
the assumption that Ge = oa), 


As the sample size (or sizes) gets large, the number of degrees of freedom for the 
t distribution increases, and the f distribution can be approximated quite closely by 
the standard normal distribution. As a result, the small-sample confidence intervals of 
this section are nearly indistinguishable from the large-sample confidence intervals 
of Section 8.6 for large n (or large n, and n2). The intervals are nearly equivalent 
when the degrees of freedom exceed 30. 

The confidence intervals for a single mean and the difference in two means were 
developed under the assumptions that the populations of interest are normally dis- 
tributed. There is considerable empirical evidence that these intervals maintain their 
nominal confidence coefficient as long as the populations sampled have roughly 
mound-shaped distributions. If nı % 75, the intervals for шу — u2 also maintain 
their nominal confidence coefficients as long as the population variances are roughly 
equal. The independence of the samples is the most crucial assumption in using the 
confidence intervals developed in this section to compare two population means. 


Exercises 


Although there are many treatments for bulimia nervosa, some subjects fail to benefit from 
treatment. In a study to determine which factors predict who will benefit from treatment, Wendy 
Baell and Е. Н. Wertheim"? found that self-esteem was one of the important predictors. The 
mean and standard deviation of posttreatment self-esteem scores for n — 21 subjects were 
y — 26.6 and s — 7.4, respectively. Find a 9596 confidence interval for the true posttreatment 
self-esteem scores. 


The carapace lengths of ten lobsters examined in a study of the infestation of the Thenus 
orientalis lobster by two types of barnacles, Octolasmis tridens and O. lowei, are given in the 


12. Source: Wendy К. Baell and E. Н. Wertheim, “Predictors of Outcome in the Treatment of Bulimia 
Nervosa;" British Journal of Clinical Psychology 31 (1992). 
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following table. Find a 95% confidence interval for the mean carapace length (in millime- 
ters, mm) of T. orientalis lobsters caught in the seas in the vicinity of Singapore. 


Lobster Field Number A06] A062 A066 A070 А067 A069 A064 A068 A065 A063 


Carapace Length (mm) 78 66 65 63 60 60 58 56 52 50 


Scholastic Assessment Test (SAT) scores, which have fallen slowly since the inception of the 
test, have now begun to rise. Originally, a score of 500 was intended to be average. The mean 
scores for 2005 were approximately 508 for the verbal test and 520 for the mathematics test. 
A random sample of the test scores of 20 seniors from a large urban high school produced the 
means and standard deviations listed in the accompanying table: 


Verbal Mathematics 


Sample mean 505 495 
Sample standard deviation 57 69 


a Find a 9096 confidence interval for the mean verbal SAT scores for high school seniors 
from the urban high school. 

b Does the interval that you found in part (a) include the value 508, the true mean verbal SAT 
score for 2005? What can you conclude? 

c Construct a 90% confidence interval for the mean mathematics SAT score for the urban 


high school seniors. Does the interval include 520, the true mean mathematics score for 
2005? What can you conclude? 


Chronic anterior compartment syndrome is a condition characterized by exercise-induced pain 
in the lower leg. Swelling and impaired nerve and muscle function also accompany the pain, 
which is relieved by rest. Susan Beckham and her colleagues !^ conducted an experiment involv- 
ing ten healthy runners and ten healthy cyclists to determine if pressure measurements within 
the anterior muscle compartment differ between runners and cyclists. The data—compartment 
pressure, in millimeters of mercury—are summarized in the following table: 


Runners Cyclists 
Condition Mean 5 Меап Ку 
Кезї 145 3.92 11.1 3.98 


80% maximal O, consumption 12.2 3.49 11.5 4.95 


a Construct a 95% confidence interval for the difference in mean compartment pressures 
between runners and cyclists under the resting condition. 

b Construct a 9096 confidence interval for the difference in mean compartment pressures 
between runners and cyclists who exercise at 80% of maximal oxygen (O5) consumption. 


c Consider the intervals constructed in parts (a) and (b). How would you interpret the results 
that you obtained? 


13. Source: W. B. Jeffries, H. K. Voris, and C. M. Yang, "Diversity and Distribution of the Pedunculate 
Barnacle Octolasmis Gray, 1825 Epizoic on the Scyllarid Lobster, Thenus orientalis (Lund 1793) Crus- 
taceana 46(3) (1984). 

14. Source: S. J. Beckham, W. A. Grana, P. Buckley, J. E. Breasile, and P. L. Claypool, *A Comparison 
of Anterior Compartment Pressures in Competitive Runners and Cyclists,’ American Journal of Sports 
Medicine 21(1) (1993). 
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Organic chemists often purify organic compounds by a method known as fractional crystalliza- 
tion. An experimenter wanted to prepare and purify 4.85 g of aniline. Ten 4.85-gram specimens 
of aniline were prepared and purified to produce acetanilide. The following dry yields were 
obtained: 


3.85, 3.88, 3.90, 3.62, 3.72, 3.80, 3.85, 336, 401, 3.82 


Construct a 95% confidence interval for the mean number of grams of acetanilide that can be 
recovered from 4.85 grams of aniline. 


Two new drugs were given to patients with hypertension. The first drug lowered the blood 
pressure of 16 patients an average of 11 points, with a standard deviation of 6 points. The 
second drug lowered the blood pressure of 20 other patients an average of 12 points, with a 
standard deviation of 8 points. Determine a 95% confidence interval for the difference in the 
mean reductions in blood pressure, assuming that the measurements are normally distributed 
with equal variances. 


Text not available due to copyright restrictions 


Refer to Exercise 8.86. 


a Construct a 9096 confidence interval for the difference in the mean price for light tuna 
packed in water and light tuna packed in oil. 


b Basedon the interval obtained in part (a), do you think that the mean prices differ for light 
tuna packed in water and oil? Why? 


The Environmental Protection Agency (EPA) has collected data on LC50 measurements 
(concentrations that kill 5096 of test animals) for certain chemicals likely to be found in 


Text not available due to copyright restrictions 
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freshwater rivers and lakes. (See Exercise 7.13 for additional details.) For certain species of 
fish, the LC50 measurements (in parts per million) for DDT in 12 experiments were as follows: 


16, 5, 21, 19, 10, 5, 8, 2, 7, 2, 4, 9 


Estimate the true mean LC50 for DDT with confidence coefficient .90. Assume that the LC50 
measurements have an approximately normal distribution. 


Refer to Exercise 8.88. Another common insecticide, diazinon, yielded LC50 measurements 
in three experiments of 7.8, 1.6, and 1.3. 


a Estimate the mean LC50 for diazinon, with a 90% confidence interval. 


b Estimate the difference between the mean LC50 for DDT and that for diazinon, with a 9096 
confidence interval. What assumptions are necessary for the method that you used to be 
valid? 


Do SAT scores for high school students differ depending on the students’ intended field of 
study? Fifteen students who intended to major in engineering were compared with 15 students 
who intended to major in language and literature. Given in the accompanying table are the 
means and standard deviations of the scores on the verbal and mathematics portion of the SAT 
for the two groups of students: !6 


Verbal Math 
Engineering y=446 s—42 у= 548 5-57 
Language/literature Уу = 534 5= 45 у=517 5252 


а Construct a 95% confidence interval for the difference in average verbal scores of students 
majoring in engineering and of those majoring in language/literature. 

b Construct a 95% confidence interval for the difference in average math scores of students 
majoring in engineering and of those majoring in language/literature. 


с Interpret the results obtained in parts (а) and (b). 


d What assumptions are necessary for the methods used previously to be valid? 


Seasonal ranges (in hectares) for alligators were monitored on a lake outside Gainesville, 
Florida, by biologists from the Florida Game and Fish Commission. Five alligators monitored 
in the spring showed ranges of 8.0, 12.1, 8.1, 18.2, and 31.7. Four different alligators monitored 
in the summer showed ranges of 102.0, 81.7, 54.7, and 50.7. Estimate the difference between 
mean spring and summer ranges, with a 95% confidence interval. What assumptions did you 
make? 


Solid copper produced by sintering (heating without melting) a powder under specified en- 
vironmental conditions is then measured for porosity (the volume fraction due to voids) in a 
laboratory. A sample of n, = 4 independent porosity measurements have mean y, = .22 and 
variance s? = .0010. A second laboratory repeats the same process on solid copper formed 
from an identical powder and gets п› = 5 independent porosity measurements with y, = .17 
and 82 = .0020. Estimate the true difference between the population means (ш — u2) for these 
two laboratories, with confidence coefficient .95. 


A factory operates with two machines of type A and one machine of type B. The weekly repair 
costs X for type A machines are normally distributed with mean и and variance o?. The 
weekly repair costs Y for machines of type B are also normally distributed but with mean шә 


16. Source: "SAT Scores by Intended Field of Study,” Riverside (Calif.) Press Enterprise, April 8, 1993. 
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and variance 307. The expected repair cost рег week for the factory is thus 24, + о. If you 
are given a random sample X,, X2,..., X, on costs of type A machines and an independent 
random sample Y1, Y?, ..., Ym on costs for type B machines, show how you would construct 
а 95% confidence interval for 2j, + [2 


a ifo?is known. 


b ifo?is not known. 


Suppose that we obtain independent samples of sizes n; and п» from two normal popula- 
tions with equal variances. Use the appropriate pivotal quantity from Section 8.8 to derive a 
100(1 — a@)% upper confidence bound for шу — H2. 


Confidence Intervals for o? 


The population variance o? quantifies the amount of variability in the population. 
Many times, the actual value of c? is unknown to an experimenter, and he or she must 
estimate o°. In Section 8.3, we proved that 52 = [1/(n — 1)] 35; , (Y; — Y) is an 
unbiased estimator for o?. Throughout our construction of confidence intervals for 
ш, we used S? to estimate o? when o? was unknown. 

In addition to needing information about o? to calculate confidence intervals for 
шапа и — u2, we may be interested in forming a confidence interval for o?. For 
example, if we performed a careful chemical analysis of tablets of a particular med- 
ication, we would be interested in the mean amount of active ingredient per tablet 
and the amount of tablet-to-tablet variability, as quantified by o?. Obviously, for a 
medication, we desire a small amount of tablet-to-tablet variation and hence a small 
value for o°. 

To proceed with our interval estimation procedure, we require the existence of a 
pivotal quantity. Again, assume that we һауе a random sample Y1, Yo,..., Y, from 
a normal distribution with mean и and variance o?, both unknown. We know from 
Theorem 7.3 that 


у) wane 
2 un 2 


Oo [03 


has a x? distribution with (n — 1) df. We can then proceed by the pivotal method to 
find two numbers x2 and X2 such that 


(п — Ds 
pesg =l-a 


for any confidence coefficient (1 — o). (The subscripts L and U stand for lower and 
upper, respectively.) The x? density function is not symmetric, so we have some 
freedom in choosing x? and Xo. We would like to find the shortest interval that 
includes o? with probability (1 — o). Generally, this is difficult and requires a trial- 
and-error search for the appropriate values of x and X5. We compromise by choosing 
points that cut off equal tail areas, as indicated in Figure 8.11. As a result, we obtain 


(п = Ds? 


2 
uw $ Xs =ч 


2 
Pl x2 ug er 


FIGURE 8.11 


Location of : бу 


апа Х 


8.9 Confidence Intervals for o? 435 


and a reordering of the inequality in the probability statement gives 


= 2 _ 9 
Е DS РЕ Р - DS Е 
Х(а/2) Хү-(а/2) 


The confidence interval for o? is as follows. 


A 100(1 — a)% Confidence Interval for o? 


(* -DS? (n— =) 
D 9 2 
Xa/2 X1— (ау) 


EXAMPLE 8.13 


Solution 


An experimenter wanted to check the variability of measurements obtained by using 
equipment designed to measure the volume of an audio source. Three independent 
measurements recorded by this equipment for the same sound were 4.1, 5.2, and 10.2. 
Estimate o? with confidence coefficient .90. 


If normality of the measurements recorded by this equipment can be assumed, the 
confidence interval just developed applies. For the data given, s? — 10.57. With 
a/2 = .05 and (n — 1) = 2 df, Table 6, Appendix 3, gives X35 — .103 and Xu 
5.99]. Thus, the 9096 confidence interval for o? is 


(= ue) " (©? ост) 
Хз х5 5991 103 J' 


and finally, (3.53, 205.24). 
Notice that this interval for o? is very wide, primarily because л is quite small. Ё 


We have previously indicated that the confidence intervals developed in Section 
8.8 for u and ш; — ш» had confidence coefficients near the nominal level even if the 
underlying populations were not normally distributed. In contrast, the intervals for o? 
presented in this section can have confidence coefficients that differ markedly from 
the nominal level if the sampled population is not normally distributed. 
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Exercises 


The EPA has set a maximum noise level for heavy trucks at 83 decibels (dB). The manner in 
which this limit is applied will greatly affect the trucking industry and the public. One way to 
apply the limit is to require all trucks to conform to the noise limit. А second but less satisfactory 
method is to require the truck fleet's mean noise level to be less than the limit. If the latter rule 
is adopted, variation in the noise level from truck to truck becomes important because a large 
value of o? would imply that many trucks exceed the limit, even if the mean fleet level were 
83 dB. A random sample of six heavy trucks produced the following noise levels (in decibels): 


85.4 86.8 86.1 85.3 84.8 86.0. 


Use these data to construct a 9096 confidence interval for o?, the variance of the truck noise- 
emission readings. Interpret your results. 


In Exercise 8.81, we gave the carapace lengths of ten mature Thenus orientalis lobsters caught 
in the seas in the vicinity of Singapore. For your convenience, the data are reproduced here. 
Suppose that you wished to describe the variability of the carapace lengths of this population 


of lobsters. Find a 9096 confidence interval for the population variance o°. 


Lobster Field Number A06] A062 A066 А070 A067 A069 A064 А068 А065 А063 
Carapace Length (тт) 78 66 65 63 60 60 58 56 52 50 


Suppose that 52 is the sample variance based on a sample of size n from a normal population 
with unknown mean and variance. Derive a 100(1 — a)% 


a upper confidence bound for o°. 


b lower confidence bound for o?. 


Given a random sample of size n from a normal population with unknown mean and variance, 
we developed a confidence interval for the population variance o? in this section. What is the 
formula for a confidence interval for the population standard deviation o? 


In Exercise 8.97, you derived upper and lower confidence bounds, each with confidence coef- 
ficient 1 — о, for o?. How would you construct a 100(1 — 0)% 


a upper confidence bound for с? 


b lower confidence bound for o? 


Industrial light bulbs should have a mean life length acceptable to potential users and a relatively 
small variation in life length. If some bulbs fail too early in their life, users become annoyed 
and are likely to switch to bulbs produced by a different manufacturer. Large variations above 
the mean reduce replacement sales; in general, variation in life lengths disrupts the user's 
replacement schedules. A random sample of 20 bulbs produced by a particular manufacturer 
produced the following lengths of life (in hours): 


2100 2302 1951 2067 2415 1883 2101 2146 2278 2019 
1924 2183 2077 2392 2286 2501 1946 2161 2253 1827 


Set up a 99% upper confidence bound for the standard deviation of the lengths of life for 
the bulbs produced by this manufacturer. Is the true population standard deviation less than 
150 hours? Why or why not? 


In laboratory work, it is desirable to run careful checks on the variability of readings produced 
on standard samples. In a study of the amount of calcium in drinking water undertaken as part 
of a water quality assessment, the same standard sample was run through the laboratory six 
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times at random intervals. The six readings, in parts per million, were 9.54, 9.61, 9.32, 9.48, 
9.70, and 9.26. Estimate the population variance o? for readings on this standard, using a 90% 
confidence interval. 


The ages of a random sample of five university professors are 39, 54, 61, 72, and 59. Using this 
information, find a 99% confidence interval for the population standard deviation of the ages 
of all professors at the university, assuming that the ages of university professors are normally 
distributed. 


A precision instrument is guaranteed to read accurately to within 2 units. A sample of four 
instrument readings on the same object yielded the measurements 353, 351, 351, and 355. Find 
a 90% confidence interval for the population variance. What assumptions are necessary? Does 
the guarantee seem reasonable? 


Summary 


The objective of many statistical investigations is to make inferences about population 
parameters based on sample data. Often these inferences take the form of estimates— 
either point estimates or interval estimates. We prefer unbiased estimators with small 
variance. The goodness of an unbiased estimator @ can be measured by сд because 
the error of estimation is generally smaller than 204 with high probability. The mean 
square error of an estimator, MSE(0) = V (0) + [B(ĝ)]?, is small only if the estimator 
has small variance and small bias. 

Interval estimates of many parameters, such as u and р, can be derived from the 
normal distribution for large sample sizes because of the central limit theorem. If 
sample sizes are small, the normality of the population must be assumed, and the 
t distribution is used in deriving confidence intervals. However, the interval for a 
single mean is quite robust in relation to moderate departures from normality. That 
15, the actual confidence coefficient associated with intervals that have a nominal 
confidence coefficient of 100(1 — a)% is very close to the nominal level even if the 
population distribution differs moderately from normality. The confidence interval 
for a difference in two means is also robust in relation to moderate departures from 
normality and to the assumption of equal population variances if n; ғ n2. As n; and 
n become more dissimilar, the assumption of equal population variances becomes 
more crucial. 

If sample measurements have been selected from a normal distribution, a con- 
fidence interval for o? can be developed through use of the x? distribution. These 
intervals are very sensitive to the assumption that the underlying population is nor- 
mally distributed. Consequently, the actual confidence coefficient associated with 
the interval estimation procedure can differ markedly from the nominal value if the 
underlying population is not normally distributed. 
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Supplementary Exercises 


Multiple Choice A survey was conducted to determine what adults prefer in cell phone 
services. The results of the survey showed that 73% of cell phone users wanted e-mail services, 
with a margin of error of +4%. What is meant by the phrase “+49”? 


a They estimate that 4% of the surveyed population may change their minds between the 
time that the poll was conducted and the time that the results were published. 


b There is a 496 chance that the true percentage of cell phone users who want e-mail service 
will not be in the interval (0.69, 0.77). 


c Only 4% of the population was surveyed. 


d It would be unlikely to get the observed sample proportion of 0.73 unless the actual pro- 
portion of cell phone users who want e-mail service is between 0.69 and 0.77. 


e The probability is .04 that the sample proportion is in the interval (0.69, 0.77). 


A random sample of size 25 was taken from a normal population with o? = 6. A confidence 
interval for the mean was given as (5.37, 7.37). What is the confidence coefficient associate 
with this interval? 


In a controlled pollination study involving Phlox drummondii, a spring-flowering annual plant 
common along roadsides in sandy fields in central Texas, Karen Pittman and Donald Levin!” 
found that seed survival rates were not affected by water or nutrition deprivation. In the experi- 
ment, flowers on plants were identified as males when they donated pollen and as females when 
they were pollinated by donor pollen in three treatment groups: control, low water, and low 
nutrient. The data in the following table reflect one aspect of the findings of the experiment: the 
number of seeds surviving to maturity for each of the three groups for both male and female 
parents. 


Male Female 
Treament n Number Surviving n Number Surviving 
Control 585 543 632 560 
Low water 578 522 510 466 
Low nutrient 568 510 589 546 


a Find a 99% confidence interval for the difference between survival proportions in the 
low-water group versus the low-nutrient group for male parents. 

b Find a 99% confidence interval for the difference between survival proportions in male and 
female parents subjected to low water. 


17. Source: Karen Pittman and Donald Levin, “Effects of Parental Identities and Environment on Com- 
ponents of Crossing Success on Phlox drummondii;" American Journal of Botany 76(3) (1989). 
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Refer to Exercise 8.106. Suppose that you plan to estimate the difference in the survival rates of 
seeds for male parents in low-water and low-nutrient environments to within .03 with probability 
.95. If you plan to use an equal number of seeds from male parents in each environment (that 
is, n, = n5), how large should n; and n, be? 


A chemist who has prepared a product designed to kill 60% of a particular type of insect wants 
to evaluate the kill rate of her preparation. What sample size should she use if she wishes to be 
95% confident that her experimental results fall within .02 of the true fraction of insects killed? 


To estimate the proportion of unemployed workers in Panama, an economist selected at random 
400 persons from the working class. Of these, 25 were unemployed. 


a Estimate the true proportion of unemployed workers and place bounds on the error of 
estimation. 


b How many persons must be sampled to reduce the bound on the error of estimation to .02? 


Past experience shows that the standard deviation of the yearly income of textile workers in a 
certain state is $400. How many textile workers would you need to sample if you wished to 
estimate the population mean to within $50.00, with probability .95? 


How many voters must be included in a sample collected to estimate the fraction of the popular 
vote favorable to a presidential candidate in a national election if the estimate must be correct 
to within .005? Assume that the true fraction lies somewhere in the neighborhood of .5. Use a 
confidence coefficient of approximately .95. 


In a poll taken among college students, 300 of 500 fraternity men favored a certain proposition 
whereas 64 of 100 nonfraternity men favored it. Estimate the difference in the proportions 
favoring the proposition and place a 2-standard-deviation bound on the error of estimation. 


Refer to Exercise 8.112. How many fraternity and nonfraternity men must be included in a 
poll if we wish to obtain an estimate, correct to within .05, for the difference in the proportions 
favoring the proposition? Assume that the groups will be of equal size and that p — .6 will 
suffice as an approximation of both proportions. 


A chemical process has produced, on the average, 800 tons of chemical per day. The daily 
yields for the past week are 785, 805, 790, 793, and 802 tons. Estimate the mean daily yield, 
with confidence coefficient .90, from the data. What assumptions did you make? 


Refer to Exercise 8.114. Find a 90% confidence interval for o?, the variance of the daily yields. 


Do we lose our memory capacity as we get older? In a study of the effect of glucose on memory 
in elderly men and women, C. A. Manning and colleagues? tested 16 volunteers (5 men and 
11 women) for long-term memory, recording the number of words recalled from a list read to 
each person. Each person was reminded of the words missed and was asked to recall as many 
words as possible from the original list. The mean and standard deviation of the long-term 
word memory scores were y = 79.47 and s = 25.25. Give a 99% confidence interval for the 
true long-term word memory scores for elderly men and women. Interpret this interval. 


The annual main stem growth, measured for a sample of 17 4-year-old red pine trees, produced 
a mean of 11.3 inches and a standard deviation of 3.4 inches. Find a 90% confidence interval 
for the mean annual main stem growth of a population of 4-year-old red pine trees subjected to 
similar environmental conditions. Assume that the growth amounts are normally distributed. 


18. Source: C. A. Manning, J. L. Hall, and P. E. Gold, “Glucose Effects on Memory and Other Neuropsy- 
chological Tests in Elderly Humans,” Psychological Science 1(5) (1990). 
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Owing to the variability of trade-in allowance, the profit per new car sold by an automobile 
dealer varies from car to car. The profits per sale (in hundreds of dollars), tabulated for the past 
week, were 2.1, 3.0, 1.2, 6.2, 4.5, and 5.1. Find a 9096 confidence interval for the mean profit 
per sale. What assumptions must be valid for the technique that you used to be appropriate? 


A mathematics test is given to a class of 50 students randomly selected from high school 1 
and also to a class of 45 students randomly selected from high school 2. For the class at high 
school 1, the sample mean is 75 points, and the sample standard deviation is 10 points. For 
the class at high school 2, the sample mean is 72 points, and the sample standard deviation 
is 8 points. Construct a 9596 confidence interval for the difference in the mean scores. What 
assumptions are necessary? 


Two methods for teaching reading were applied to two randomly selected groups of elementary 
schoolchildren and were compared on the basis of a reading comprehension test given at the 
end of the learning period. The sample means and variances computed from the test scores 
are shown in the accompanying table. Find a 95% confidence interval for (ш — u2). What 
assumptions are necessary? 


Statistic Method 1 Method 2 
Number of children in group 11 14 
y 64 69 
52 52 71 


A comparison of reaction times for two different stimuli іп а psychological word-association 
experiment produced the results (in seconds) shown in the accompanying table when applied 
to a random sample of 16 people. Obtain а 90% confidence interval for (uw; — m2). What 
assumptions are necessary? 


Stimulus 1 Stimulus 2 
1 2 4 1 
3 1 2 2 
2 3 3 3 
1 2 3 3 


The length of time between billing and receipt of payment was recorded for a random sample 
of 100 of a certified public accountant (CPA) firm's clients. The sample mean and standard 
deviation for the 100 accounts were 39.1 days and 17.3 days, respectively. Find a 9096 con- 
fidence interval for the mean time between billing and receipt of payment for all of the CPA 
firm's accounts. Interpret the interval. 


Television advertisers may mistakenly believe that most viewers understand most of the adver- 
tising that they see and hear. A recent research study asked 2300 viewers above age 13 to look 
at 30-second television advertising excerpts. Of these, 1914 of the viewers misunderstood all 
or part of the excerpt they saw. Find a 9596 confidence interval for the proportion of all viewers 
(of which the sample is representative) who will misunderstand all or part of the television 
excerpts used in this study. 


A survey of 415 corporate, government, and accounting executives of the Financial Accounting 
Foundation found that 278 rated cash flow (as opposed to earnings per share, etc.) as the most 
important indicator of a company's financial health. Assume that these 415 executives constitute 
a random sample from the population of all executives. Use the data to find a 9596 confidence 
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interval for the fraction of all corporate executives who consider cash flow the most important 
measure of a company’s financial health. 


Suppose that independent samples of sizes n, and п» are taken from two normally distributed 
populations with variances с? and o7, respectively. If 52 and S2 denote the respective sample 
variances, Theorem 7.3 implies that (n, — 1)52/02 and (n; — 1)52/02 have x? distributions 
with nı — 1 and m — 1 df, respectively. Further, these x?-distributed random variables are 
independent because the samples were independently taken. 


a Use these quantities to construct a random variable that has an F distribution with n; — 1 
numerator degrees of freedom and n; — 1 denominator degrees of freedom. 


b Use the F-distributed quantity from part (a) as a pivotal quantity, and derive a formula for 
a 100(1 — œ)% confidence interval for of/of. 


A pharmaceutical manufacturer purchases raw material from two different suppliers. The mean 
level of impurities is approximately the same for both suppliers, but the manufacturer is con- 
cerned about the variability in the amount of impurities from shipment to shipment. If the level 
of impurities tends to vary excessively for one source of supply, this could affect the quality 
of the final product. To compare the variation in percentage impurities for the two suppliers, 
the manufacturer selects ten shipments from each supplier and measures the percentage of im- 
purities in each shipment. The sample variances were 52 = .273 and 52 = .094, respectively. 
Form a 9546 confidence interval for the ratio of the true population variances. 


Let Y denote the mean of a sample of size 100 taken from a gamma distribution with known 
à = со and unknown f. Show that an approximate 100(1 — w)% confidence interval for В is 
given by 


Y Y 
Co + .lzoa /Co Со — lzap/ 0o ] 


Suppose that we take a sample of size nı from a normally distributed population with mean 
and variance ш; and о? and an independent of sample size n) from a normally distributed 
population with mean and variance u and o3. If it is reasonable to assume that o? = o3, then 
the results given in Section 8.8 apply. 

What can be done if we cannot assume that the unknown variances are equal but are fortunate 
enough to know that o? = ko? for some known constant К 4 1? Suppose, as previously, that 
the sample means are given by Y and Y, and the sample variances by S? and 52, respectively. 


a Show that Z* given below has a standard normal distribution. 
_ (Qi - Үз) - (ш — m) 
I k | 
+ 


ny NM 


7* 


b Show that W* given below has a x? distribution with n, + n; — 2 df. 


_ (n, — 1)$? + (m — 1)S3/k 


2 
9j 


үу” 


c Notice that Z* and W* from parts (а) and (b) are independent. Finally, show that 


T= (Y; — Y>) — (ш — о) (ny — DS? + (nm — 1)S3/k 
1 k п +n, = 2 


ny n» 


2 
Я where S Pi = 


has at distribution with n; + пә — 2 df. 
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d Use the result in part (c) to give a 100(1 — w)% confidence interval for ш — H2, assuming 
that o? = Коў. 


e What happens if k = 1 in parts (a)-(d)? 
We noted in Section 8.3 that if 


п, -Yr -Yr 
y» 1 ) апа 52 = Mia 1 ) 


п п—1 


12, 
$ = , 


then S^? is a biased estimator of o°, but S? is an unbiased estimator of ће same parameter. If 
we sample from a normal population, 


a find V(S”). 
b show that V(S?) > V(S^). 


Exercise 8.129 suggests that S? is superior to S° in regard to bias and that S^? is superior to 
S? because it possesses smaller variance. Which is the better estimator? [Hint: Compare the 
mean square errors.] 


Refer to Exercises 1.129 and 1.130. S? and S^ are two estimators for о> that are of the form 
C bam 101 – Y)?. What value for с yields the estimator for o? with the smallest mean square 
error among all estimators of the form c У)” (Y; — YY ? 


Refer to Exercises 6.17 and 8.14. The distribution function for a power family distribution is 
given by 


0, y <0, 
ЕО) = (5) ‚ 0xyz6, 
1 у> Ө, 


where о, 0 > 0. Assume that a sample of size п is taken from a population with a power family 
distribution and that œ = c where c > 0 is known. 


a Show that the distribution function of Ya) = max(Y;, Y2,..., Yn} is given by 
0, y <0, 
y ne 
Fyno= (2). Os ys 
1, у> 0, 


where 0 > 0. 


b Show that Ү(„)/0 is a pivotal quantity and that for 0 < k < 1 
Yon) E cn 


c Suppose that n = 5 anda = c = 2.4. 


i Use the result from part (b) to find k so that 
Yi 
P (x< а 1) = oss. 


ii Give a 95% confidence interval for Ө. 


*8.133 


*8.134 


*8.135 


*8.136 


Supplementary Exercises 443 


Suppose that two independent random samples of n, and n» observations are selected from 
normal populations. Further, assume that the populations possess a common variance a”. Let 


Y", - Y 


п; —1 


S2 = 


i 


; i3. 


a Show that S5; the pooled estimator of c? (which follows), is unbiased: 


52 (пу — DS? + (ny – 1)52 
Р пу+п›—2 ` 
b Find ү (52). 


The small-sample confidence interval for џи, based on Student's t (Section 8.8), possesses a 
random width—in contrast to the large-sample confidence interval (Section 8.6), where the 
width is not random if c? is known. Find the expected value of the interval width in the 
small-sample case if o? is unknown. 


A confidence interval is unbiased if the expected value of the interval midpoint is equal to 
the estimated parameter. The expected value of the midpoint of the large-sample confidence 
interval (Section 8.6) is equal to the estimated parameter, and the same is true for the small- 
sample confidence intervals for и and (шу — и») (Section 8.8). For example, the midpoint of 
the interval y 4- ts/4/n is y, and E(Y) = и. Now consider the confidence interval for o?. Show 
that the expected value of the midpoint of this confidence interval is not equal to o°. 


The sample mean Y is a good point estimator of the population mean и. It can also be used to 
predict a future value of Y independently selected from the population. Assume that you have 
a sample mean Y and variance $? based on a random sample of n measurements from a normal 
population. Use Student's t to form a pivotal quantity to find a prediction interval for some 
new value of Y —say, Y, —to be observed in the future. [Hint: Start with the quantity У, — У] 
Notice the terminology: Parameters are estimated; values of random variables are predicted. 
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Introduction 


In Chapter 8, we presented some intuitive estimators for parameters often of interest 
in practical problems. An estimator д for a target parameter Ө is a function of the 
random variables observed in a sample and therefore is itself a random variable. 
Consequently, an estimator has a probability distribution, the sampling distribution 
of the estimator. We noted in Section 8.2 that, if E (д) = 0, then the estimator has the 
(sometimes) desirable property of being unbiased. 

In this chapter, we undertake a more formal and detailed examination of some of the 
mathematical properties of point estimators—particularly the notions of efficiency, 
consistency, and sufficiency. We present a result, the Rao—Blackwell theorem, that 
provides a link between sufficient statistics and unbiased estimators for parameters. 
Generally speaking, an unbiased estimator with small variance is or can be made to be 
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a function of a sufficient statistic. We also demonstrate a method that can sometimes 
be used to find minimum-variance unbiased estimators for parameters of interest. We 
then offer two other useful methods for deriving estimators: the method of moments 
and the method of maximum likelihood. Some properties of estimators derived by 
these methods are discussed. 


Relative Efficiency 


It usually is possible to obtain more than one unbiased estimator for the same target 
parameter 0. In Section 8.2 (Figure 8.3), we mentioned that if д, and 6) denote two 
unbiased estimators for the same parameter Ө, we prefer to use the estimator with 
the smaller variance. That is, if both estimators are unbiased, ôi is relatively more 
efficient than 0; if V (дә) > V (ду). In fact, we use the ratio ү(д›)/ V (д\) to define the 
relative efficiency of two unbiased estimators. 


Given two unbiased estimators @ апа 0, of а parameter 0, with variances 
V (01) and V (05), respectively, then the efficiency of 0; relative to 62, denoted 
eff (6,, 45), is defined to be the ratio 
ae ve 
eff (6,, 6) = (б). 
V (81) 


If 6; and 6, are unbiased estimators for 6, the efficiency of 0, relative to Ё), 
eff (ôi, 0), is greater than 1 only if V (05) > V (0). In this case,Ó; is a better unbiased 
estimator than 05. For example, if eff (01, б») = 1.8, Шеп V (62) = (1.8)V (д\), апа 
6, is preferred to 6, . Similarly, if eff (д\, 0) is less than 1—say, .73—then V (05) — 
(.73)V (д\), and 0 is preferred to ĝi. Let us consider ап example involving two 
different estimators for a population mean. Suppose that we wish to estimate the 
mean of a normal population. Let 0; be the sample median, the middle observation 
when the sample measurements are ordered according to magnitude (л odd) or the 
average of the two middle observations (n even). Let 42 be the sample mean. Although 
proof is omitted, it can be shown that the variance of the sample median, for large 
n, is V (0i) = (1.2533)? (c? /n). Then the efficiency of the sample median relative to 
the sample mean is 


V (05) » o? /п Е 1 
Vô)  (1.2533?o?/n (1.2533)2 


eff (0,, 0) = = .6366. 


Thus, we see that the variance of ће sample mean is approximately 64% of the 
variance of the sample median. Therefore, we would prefer to use the sample mean 
as the estimator for the population mean. 
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EXAMPLE 9.1 


Solution 


Let Yi, Y2,..., Y, denote a random sample from the uniform distribution on the 
interval (0, 0). Two unbiased estimators for 0 are 


i Н 1 
ô =2Ү and à- (2 | 
п 


., Y,). Find the efficiency of ôi relative to 0. 


where Y(,; = max(Yi, Y», .. 


29 


Because each Y; has a uniform distribution on the interval (0, 0), u = E(Y;) = 0/2 
ando? = V(Y;) = 67/12. Therefore, 


EÔ) = EQY) = 2Е(Ү) = 2(и) = 2 (5) = 0, 


and 6, is unbiased, as claimed. Further, 


even sem -4[ 7 (4) (5) - 
V(0j)) VY) =4V(Y) =4 = = . 
n n 12 3n 


To find the mean and variance of 6, recall (see Exercise 6.74) that the density 
function of Yn) is given by 


кде 
Emo) = п[Рү(у)]' fr) = " 0 Ө} E ке 


0, elsewhere. 


п d Р п 
Ee А у ау = "EN 0, 


and it follows that E([(n + D/n]Yo) = 0; that is, 0, is an unbiased estimator for Ө. 


Because 
n р n 
E ү? E n4l d Z 0?, 
(Yo) x] ib: (5) 


Thus, 


we obtain 


2 
упы = EAD - EG =| { -( : JG 


n 


1 1\2 
Vb) = V (=) Yo = (=) V (Yo) 
n n 


= Eas D ] T ш. 
n(n + 2) 


+ 
N 


and 


Е п(п+ 2). 

Therefore, the efficiency of 0; relative to 6) is given by 

V)  O/[nn+2)] 3 

VÂ)  62/3n — n+? 

This efficiency is less than 1 ifn > 1. That is, ifn > 1, 0; has a smaller variance than 
6,, and therefore б» is generally preferable to 6, as an estimator of Ө. О 


eff (01, 02) = 
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We present some methods for finding estimators with small variances later in this 
chapter. For now we wish only to point out that relative efficiency is one important 
criterion for comparing estimators. 


Exercises 


In Exercise 8.8, we considered a random sample of size 3 from an exponential distribution with 
density function given by 

(1/0)ет>!9, 0 < у, 

ХО) = | 

0, elsewhere, 
and determined that б, = Y,, 6) = (Y, + Y2)/2, 63 = (Y, +2Y>)/3, and ĝ; = Y are all unbiased 
estimators for Ө. Find the efficiency of 0, relative to 05, of 6) relative to 05, and of 43 relative 
to д; s 


Let Y;, Yo, ..., Y, denote a random sample from a population with mean u and variance о?. 
Consider the following three estimators for ш: 
1 1 Yo+---+Y,-1 1 = 
б = —(Yi + Y2), Йо = = Ү; = Ү,, з — Y. 
Ёл zí 1+ №) = 4 i+ Xn —2) +1 Ёз 


а Show that each of the three estimators is unbiased. 


b Find the efficiency of й; relative to fi and fi,, respectively. 


Let Yi, Y?,..., Y, denote a random sample from the uniform distribution on the interval 
(0, 0 + 1). Let 
P — ! x n 
0,—Y-—- and h= Ym- z 
2 п+ 1 


a Show that both 6, and 0 are unbiased estimators of Ө. 
b Find the efficiency of 6, relative to 6). 


Let Yi, Y», ..., Y, denote a random sample of size n from a uniform distribution on the interval 
(0, Ө). If Yu) = min(Y;, Y», ..., Ү„), the result of Exercise 8.18 is that ô = (n+ 1) Yq) is an 
unbiased estimator for 0. If Ya) = max(Y;, Y2,..., Ү,), the results of Example 9.1 imply that 


à; = [(n + 1)/n]Y( is another unbiased estimator for Ө. Show that the efficiency of 6, to Ó 
is 1/12. Notice that this implies that Ê, is a markedly superior estimator. 


Suppose that Yi, Y2,..., Y, is a random sample from a normal distribution with mean џи and 
variance o?. Two unbiased estimators of o? are 
1 


62 = 5° = 


п Е 1 
Уи - У? and 62 = sue Y». 
iz 


n—14 
Find the efficiency of ó? relative to 62. 


Suppose that Y;, Y2,..., Y, denote a random sample of size п from a Poisson distribution 
with mean А. Consider 4, = (Y, + Y) /2 and hy = Y. Derive the efficiency of hy, relative 
to ds. 


Suppose that Yi, Y2,..., Y, denote a random sample of size п from an exponential distribution 
with density function given by 
(1/0)e79, O<y, 


Я elsewhere. 


ХО) = 
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9.3 


In Exercise 8.19, we determined that 0, = nY is an unbiased estimator of Ө with MSE( 6 1) = 
02. Consider the estimator 6. = Y and find the efficiency of 0, relative to 6). 


Let Yi, Y2,..., Y, denote a random sample from a probability density function f(y), which 
has unknown parameter Ө. If Ê is an unbiased estimator of 6, then under very general conditions 


| amn fT 
V (8) > I(0), where 7(0) = [ne Ge] i 


(This is known as the Cramer-Rao inequality.) If V (0) = I(0), the estimator 0 is said to be 
efficient.! 


a Suppose that f (у) is the normal density with mean jz and variance o?. Show that Y is ап 
efficient estimator of ju. 

b This inequality also holds for discrete probability functions p(y). Suppose that p(y) is the 
Poisson probability function with mean A. Show that Y is an efficient estimator of À. 


Consistency 


Suppose that a coin, which has probability p of resulting in heads, is tossed n times. 
If the tosses are independent, then У, the number of heads among the n tosses, has а 
binomial distribution. If the true value of p is unknown, the sample proportion Y /n is 
an estimator of p. What happens to this sample proportion as the number of tosses n 
increases? Our intuition leads us to believe that as n gets larger, Y/n should get closer 
to the true value of p. That is, as the amount of information in the sample increases, 
our estimator should get closer to the quantity being estimated. 

Figure 9.1 illustrates the values of р = Y /n for a single sequence of 1000 Bernoulli 
trials when the true value of р is 0.5. Notice that the values of р bounce around 0.5 
when the number of trials is small but approach and stay very close to p — 0.5 as the 
number of trials increases. 

The single sequence of 1000 trials illustrated in Figure 9.1 resulted (for larger п) 
in values for the estimate that were very close to the true value, p — 0.5. Would 
additional sequences yield similar results? Figure 9.2 shows the combined results of 
50 sequences of 1000 trials. Notice that the 50 distinct sequences were not identical. 
Rather, Figure 9.2 shows a "convergence" of sorts to the true value p — 0.5. This 
is exhibited by a wider spread of the values of the estimates for smaller numbers of 
trials but a much narrower spread of values of the estimates when the number of trials 
is larger. Will we observe this same phenomenon for different values of p? Some of 
the exercises at the end of this section will allow you to use applets (accessible at 
www.thomsonedu.com/statistics/wackerly) to explore more fully for yourself. 

How can we technically express the type of "convergence" exhibited in Figure 9.2? 
Because Y/n is a random variable, we may express this “closeness” to p in proba- 
bilistic terms. In particular, let us examine the probability that the distance between 
the estimator and the target parameter, |(Y/n) — p|, will be less than some arbitrary 
positive real number =. Figure 9.2 seems to indicate that this probability might be 


]. Exercises preceded by an asterisk are optional. 


FIGURE 9.1 
Values of p = Y/n for 
a single sequence of 
1000 Bernoulli trials, 
p=0.5 


FIGURE 9.2 
Values of p = Y/n for 
50 sequences of 
1000 Bernoulli trials, 
p=0.5 
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increasing as п gets larger. If our intuition is correct and n is large, this probability, 


(7 | ) 
P\|——p|<e), 
n 


should be close to 1. If this probability in fact does tend to 1 as п — оо, we then 
say that (Y /n) is a consistent estimator of p, or that (Y /n) “converges in probability 
to p." 
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DEFINITION 9.2 


THEOREM 9.1 


Proof 


The estimator 6, is said to be a consistent estimator of 8 if, for any positive 
number e£, 


lim P(|6, 0| e) 21 
n—oo 
or, equivalently, 


lim P(\6, — 0| > £) = 0. 
u— oo 


The notation Ó, expresses that the estimator for @ is calculated by using a sample 
of size n. For example, Y> is the average of two observations whereas У оо is the 
average of the 100 observations contained in a sample of size n = 100. If 6, is an 
unbiased estimator, the following theorem can often be used to prove that the estimator 
is consistent. 


An unbiased estimator Ó, for Ó is a consistent estimator of Ө if 
lim V(6,) = 0. 
n—oo 


If Y is any random variable with E(Y) = и and V(Y) = c? < co and if k is 
any nonnegative constant, Tchebysheff's theorem (see Theorem 4.13) implies 
that 


1 
P(lY —p| > ко) < x 
Because д, is an unbiased estimator for 0, it follows that Е (0,) = @„ Ша! 0$, = 


V V (6,) denote the standard error of the estimator Ó,. If we apply Tchebysheff’s 
theorem for the random variable б, we obtain 


P(|6, — 0| > ko) < E 
Let n be any fixed sample size. For any positive number e, 
k = — 
Oy 
is a positive number. Application of Tchebysheff’s theorem for this fixed п and 
this choice of k shows that 


Р(ф,—8|> в) = P (I - ol» | = |.) = : NEC 
= 


6, є/ ga) 2 


Thus, for any fixed л, 


V(0,) 
e 7 


0x P(|ô, 6|» г) < 
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Jt lunas V (0,) — 0 and we take the limit as n — oo of the preceding 
sequence of probabilities, 


lim (0) < lim P(|8, — 6| > ғ) < lim — 
n— oo n— oo n— oo E 


Thus, 6, is a consistent estimator for Ө. 


The consistency property given in Definition 9.2 and discussed in Theorem 9.1 
involves a particular type of convergence of 6, to Ө. For this reason, the statement 
“6, is a consistent estimator for 0" is sometimes replaced by the equivalent statement 
"D, converges in probability to 0." 


EXAMPLE 9.2 


Solution 


Let Yı, Y2,..., Y, denote a random sample from a distribution with mean и and 
variance o? < oo. Show that Y, = 1 as Y; is a consistent estimator of u. (Note: 
We use the notation Y, to explicitly indicate that Y is calculated by using a sample 
of size n.) 


We know from earlier chapters that Е (Y,) = wand V(Y,) = о? /n. Because Y, is 
unbiased for jz and V(Y,,) — Oas п — oo, Theorem 9.1 establishes that Y, is a con- 
sistent estimator of jz. Equivalently, we may say that Y, converges in probability to џи. 

The fact that Y, is consistent for jz, or converges in probability to и, is some- 
times referred to as the law of large numbers. It provides the theoretical justification 
for the averaging process employed by many experimenters to obtain precision in 
measurements. For example, an experimenter may take the average of the weights of 
many animals to obtain a more precise estimate of the average weight of animals of 
this species. The experimenter's feeling, a feeling confirmed by Theorem 9.1, is that 
the average of many independently selected weights should be quite close to the true 
mean weight with high probability. E 


THEOREM 9.2 


In Section 8.3, we considered an intuitive estimator for шу — u2, the difference in 
the means of two populations. The estimator discussed at that time was Ү — FY,, the 
difference in the means of independent random samples selected from two popula- 
tions. The results of Theorem 9.2 will be very useful in establishing the consistency 
of such estimators. 


Suppose that 6,, converges in probability to 0 and that à converges in probability 
to 0’. 


a 6,+ à converges in probability to 0 + 0". 

b 6, x 6! converges in probability to 6 x 6’. 

c If 0' Z 0,0,/0/ converges in probability to 6/0’ . 

d Ifg(-)isareal-valued function that is continuous at 0, then e(n) converges in 
probability to g (0). 
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The proof of Theorem 9.2 closely resembles the corresponding proof in the case 
where {an} and {b,} are sequences of real numbers converging to real limits a and b, 
respectively. For example, if a, — a and b, — b then 


an +b, > a +b. 


EXAMPLE 9.3 


Solution 


Suppose that Y1, Y2,..., Y, represent a random sample such that E(Y;) = p, 
E(Y2) = u, and Е(Ү?) = y, are all finite. Show that 


gs. Y - Y 


n—i = 


is a consistent estimator of o? = V (Y;). (Note: We use subscript п on both S? and Y 
to explicitly convey their dependence on the value of the sample size n.) 


We have seen in earlier chapters that S?. now written as 52, is 


n? 


1 п E n 1 п E 
52 = ү2 -nY |= Y? -Y |. 
n a i n 1 (4) GÈ i n 


The statistic (1/7) У” , ү? is the average of n independent and identically distributed 
random variables, with E(Y2) = и and V(Y2) = u}, — (и)? < оо. By the law 
of large numbers (Example 9.2), we know that (1/n) У , y? converges in probabi- 
lity to и. 

Example 9.2 also implies that Y, converges in probability to и. Because the 
function g(x) = x? is continuous for all finite values of x, Theorem 9.2(d) implies 
that ү? converges in probability to и?. It then follows from Theorem 9.2(a) that 


converges in probability to и» — и? = о?. Because n/(n — 1) is a sequence of con- 
stants converging to 1 as п — oo, we can conclude that 52 converges in probability 
to о?. Equivalently, S2, the sample variance, is a consistent estimator for o?, the 


population variance. E 


In Section 8.6, we considered large-sample confidence intervals for some param- 
eters of practical interest. In particular, if Y1, Y2,..., Y, is a random sample from 


29 


any distribution with mean и and variance о?, we established that 


res (7) 


is a valid large-sample confidence interval with confidence coefficient approximately 
equal to (1 — o). If o? is known, this interval can and should be calculated. However, 
if o? is not known but the sample size is large, we recommended substituting S for o 
in the calculation because this entails no significant loss of accuracy. The following 
theorem provides the theoretical justification for these claims. 


THEOREM 9.3 
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Suppose that U,, has a distribution function that converges to a standard normal 
distribution function as n — oo. If W, converges in probability to 1, then the 
distribution function of U,/ №, converges to a standard normal distribution 
function. 


This result follows from a general result known as Slutsky's theorem (Serfling, 
2002). The proof of this result is beyond the scope of this text. However, the usefulness 
of the result is illustrated in the following example. 


EXAMPLE 9.4 


Solution 


Suppose that Y;, Yo, ..., Y, is a random sample of size n from a distribution with 
Е(Ү;) = wand V(Y;) = o?. Define 52 а$ 


1 п o. 
52 = yo -Y. 
i=] 


n —]1-— 


Show that the distribution function of 


У, – ш 
«(5 


converges to a standard normal distribution function. 


In Example 9.3, we showed that S? converges in probability to o?. Notice that g(x) — 
-FA/x [c is a continuous function of x if both x and c are positive. Hence, it follows 
from Theorem 9.2(d) that S,/o = +y 52/02 converges in probability to 1. We also 
know from the central limit theorem (Theorem 7.4) that the distribution function of 


p m 
U, — Jn (=) 
g 


converges to a standard normal distribution function. Therefore, Theorem 9.3 implies 
that the distribution function of 


с S, 


converges to a standard normal distribution function. [gl 


The result of Example 9.4 tells us that, when n is large, Vn n = м)/5һ has 
approximately a standard normal distribution whatever is the form of the distribution 
from which the sample is taken. If the sample is taken from a normal distribution, the 
results of Chapter 7 imply that t = /n(Y, — ш)/5, has a t distribution with n — 1 
degrees of freedom (df). Combining this information, we see that, if a large sample is 
taken from a normal distribution, the distribution function of = „y/n (Y, — и) /$, can 
be approximated by a standard normal distribution function. That is, as п gets large 
and hence as the number of degrees of freedom gets large, the t-distribution function 
converges to the standard normal distribution function. 
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9.9 


If we obtain a large sample from any distribution, we know from Example 9.4 
that ./n(Y, — 11)/S, has approximately a standard normal distribution. Therefore, it 


follows that 
Ys = 
P| ыа sen 9 : sue &1-a. 


If we manipulate the inequalities in the probability statement to isolate u in the middle, 


we obtain 
Р|Ү B. ge Y„ + Sn 1 
n a = 25 n а R — a. 
N a T ca UR 


Thus, Y, + 2, /2($5„ / х/т) forms a valid large-sample confidence interval for ш, with 
confidence coefficient approximately equal to 1 — o. Similarly, Theorem 9.3 can be 
applied to show that 


PnQn 


Pn T 20/2 


is a valid large-sample confidence interval for p with confidence coefficient approx- 
imately equal to 1 — o. 

In this section, we have seen that the property of consistency tells us something 
about the distance between an estimator and the quantity being estimated. We have 
seen that, when the sample size is large, Y, is close to и, and S? is close to 02, with 
high probability. We will see other examples of consistent estimators in the exercises 
and later in the chapter. 

In this section, we have used the notation Y ,, 5 2; Ên, and, in general, 6, to explicitly 
convey the dependence of the estimators on the sample size n. We needed to do so 
because we were interested in computing 

lim P(|Ó, — 0| < e). 


n— oo 


If this limitis 1, thenÓ, isa “consistent” estimator for 6 (more precisely, 6,, aconsistent 
sequence of estimators for Ө). Unfortunately, this notation makes our estimators look 
overly complicated. Henceforth, we will revert to the notation Ó as our estimator for Ө 
and not explicitly display the dependence of the estimator on n. The dependence of 4 
on the sample size n is always implicit and should be used whenever the consistency 
of the estimator is considered. 


Exercises 


Applet Exercise How was Figure 9.1 obtained? Access the applet PointSingle at www. 
thomsonedu.com/statistics/wackerly. The top applet will generate a sequence of Bernoulli 
trials [X; = 1, 0 with p(1) = p, р(0) = 1— p] with р = .5, а scenario equivalent to succes- 
sively tossing a balanced coin. Let Y, — Laa X; = the number of 1s in the first n trials and 
Pn = Y,/n. For each n, the applet computes р, and plots it versus the value of n. 


a If ps = 2/5, what value of Хе will result in ps > ps? 
b Click the button “One Trial" a single time. Your first observation is either 0 or 1. Which 
value did you obtain? What was the value of р? Click the button “One Trial” several more 


9.10 


9.12 


9.13 


9.14 
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times. How many trials n have you simulated? What value of f, did you observe? Is the 
value close to .5, the true value of p? Is the graph a flat horizontal line? Why or why not? 

c Click the button “100 Trials" a single time. What do you observe? Click the button 
*100 Trials" repeatedly until the total number of trials is 1000. Is the graph that you 
obtained identical to the one given in Figure 9.1? In what sense is it similar to the graph in 
Figure 9.1? 

d Based on the sample of size 1000, what is the value of 1090? Is this value what you expected 
to observe? 


e Click the button “Reset.” Click the button “100 Trials” ten times to generate another 
sequence of values for p. Comment. 


Applet Exercise Refer to Exercise 9.9. Scroll down to the portion of the screen labeled 
“Try different probabilities.” Use the button labeled “р =” in the lower right corner of the 
display to change the value of p to a value other than .5. 


a Click the button “One Trial" a few times. What do you observe? 


b Click the button “100 Trials" a few times. What do you observe about the values of f, as 
the number of trials gets larger? 


Applet Exercise Refer to Exercises 9.9 and 9.10. How can the results of several sequences of 
Bernoulli trials be simultaneously plotted? Access the applet PointbyPoint. Scroll down until 
you can view all six buttons under the top graph. 


a Do not change the value of p from the preset value p = .5. Click the button “Опе Trial" a 
few times to verify that you are obtaining a result similar to those obtained in Exercise 9.9. 
Click the button “5 Trials" until you have generated a total of 50 trials. What is the value 
of рѕо that you obtained at the end of this first sequence of 50 trials? 

b Click the button “New Sequence.” The color of your initial graph changes from red to 
green. Click the button “5 Trials" a few times. What do you observe? Is the graph the same 
as the one you observed in part (a)? In what sense is it similar? 

с Click the button “New Sequence.” Generate a new sequence of 50 trials. Repeat until you 
have generated five sequences. Are the paths generated by the five sequences identical? In 
what sense are they similar? 


Applet Exercise Refer to Exercise 9.11. What happens if each sequence is longer? Scroll 
down to the portion of the screen labeled “Longer Sequences of Trials.” 


a Repeat the instructions in parts (a)-(c) of Exercise 9.11. 


b What do you expect to happen if p is not 0.5? Use the button in the lower right corner to 
change to value of p. Generate several sequences of trials. Comment. 


Applet Exercise Refer to Exercises 9.9—9.12. Access the applet Point Estimation. 


a Chose a value for p. Click the button *New Sequence" repeatedly. What do you observe? 


b Scroll down to the portion of the applet labeled “Моге Trials." Choose a value for p and 
click the button “New Sequence" repeatedly. You will obtain up to 50 sequences, each 
based on 1000 trials. How does the variability among the estimates change as a function of 
the sample size? How is this manifested in the display that you obtained? 


Applet Exercise Refer to Exercise 9.13. Scroll down to the portion of the applet labeled 
“Mean of Normal Data.” Successive observed values of a standard normal random variable can 
be generated and used to compute the value of the sample mean У„. These successive values 
are then plotted versus the respective sample size to obtain one "sample path." 
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a Do you expect the values of Y, to cluster around any particular value? What value? 


b If the results of 50 sample paths are plotted, how do you expect the variability of the 
estimates to change as a function of sample size? 


с Click the button "New Sequence” several times. Did you observe what you expected based 
on your answers to parts (a) and (b)? 
9.15 Refer to Exercise 9.3. Show that both 6, and й, are consistent estimators for Ө. 


9.16 Refer to Exercise 9.5. Is 6 a consistent estimator of o°? 


9.17 Suppose that Х|, Х›,..., X, and Yı, Y2,..., Y, are independent random samples from pop- 
ulations with means и and и» and variances с? апа 02. respectively. Show that X — Y isa 
consistent estimator of шу — Иэ. 


9.18 In Exercise 9.17, suppose that the populations are normally distributed with o? = 02 = o°. 


Show that 
NEU = ke + ee HIP 


2n — 2 
is a consistent estimator of c?. 
9.19  LetY;, №,..., Y, denote a random sample from the probability density function 
| fey, 0<y<l, 
FO) = | 0, elsewhere, 


where Ө > 0. Show that Y is a consistent estimator of 0 /(0 + 1). 


9.20 If Y has a binomial distribution with n trials and success probability p, show that Y/n is a 
consistent estimator of p. 


9.21 Let Yı, Yo,..., Y, be a random sample of size n from a normal population with mean и 
and variance o?. Assuming that n = 2k for some integer К, one possible estimator for o? is 
given by 

1 & 

a2 

О” Rm Yo; — Yz 
2k j (№ 21-1) 


a Show that 6? is an unbiased estimator for o?. 


b Show that 6? is a consistent estimator for o?. 


9.22 Refer to Exercise 9.21. Suppose that Yi, Yo, ..., Y, is a random sample of size n from a 
Poisson-distributed population with mean A. Again, assume that n = 2k for some integer k. 
Consider 


a Show that А is an unbiased estimator for А. 


b Show that A is a consistent estimator for А. 


9.23 Refer to Exercise 9.21. Suppose that Yi, Yo, ..., Y, is a random sample of size n from a 
population for which the first four moments are finite. That is, m, = E(Yi) < oo, m, = 
E(Y2) < oo, m} = E(Yj) < оо, and m, = Е(Үў) < оо. (Note: This assumption is valid for 
the normal and Poisson distributions in Exercises 9.21 and 9.22, respectively.) Again, assume 


9.24 


9.25 


*9.26 


*9.27 


*9.28 
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that n = 2k for some integer k. Consider 


1 k 
82 = — Yo; — Yoj_ E 
imr 2« 2 21-1) 


a Show that 52 is an unbiased estimator for o?. 
b Show that 6? is a consistent estimator for o°. 
c Why did you need the assumption that m; = E(Y}') < oo? 


Let Yi, Y2, Y3,...¥, be independent standard normal random variables. 
a What is the distribution of У" У2? 
b Let W, = 157 , Y2. Does W, converge in probability to some constant? If so, what is the 


value of the constant? 


Suppose that Yi, У, ..., Y, denote a random sample of size n from a normal distribution with 

mean jz and variance 1. Consider the first observation Y, as an estimator for и. 

a Show that Y, is an unbiased estimator for и. 

b Find P(|Y; — u| x 1). 

с Look at the basic definition of consistency given in Definition 9.2. Based on the result of 
part (b), is Y; a consistent estimator for u? 


It is sometimes relatively easy to establish consistency or lack of consistency by appeal- 
ing directly to Definition 9.2, evaluating P(|Ó, — 0| < =) directly, and then showing that 


lim, ,4, P (|, — 0| < ©) = 1. Let Y, Y», ..., Y, denote a random sample of size п from 
a uniform distribution on the interval (0, 0). If Yin) = max(Yi, Yo,..., Yn), we showed in 
Exercise 6.74 that the probability distribution function of Yo) is given by 
0, y <0, 
Fa) = (y/0)", O<y<@, 
1, у> Ө. 


а Foreach п > 1 and every = > 0, it follows that РҮ) — 0| < e) = P(0—& < Yo < 
Ө + £). Ife > 0, verify that P(0 — = < Yo, < 0 + =) = 1 and that, for every positive 
€ < 0, we obtain P(0 — € < Ym < 0 +€) = 1— [(0 — =)/0]". 

b Using the result from part (а), show that Ya) is a consistent estimator for 0 by showing 
that, for every = > 0, lim, P (lYm — 0| < в) = 1. 


Use the method described in Exercise 9.26 to show that, if Ya) = min(Y;, Yo, ..., Ү„) when 
Yi, Yo,..., Y, are independent uniform random variables on the interval (0, 0), then Yq) is not 
a consistent estimator for Ө. [Hint: Based on the methods of Section 6.7, Yq) has the distribution 
function 


0, у < 0, 
Fay) = 1 (1 у/0)", Os ys, 
1, у> Ө.] 
Let Y;, Y2,..., Y, denote а random sample of size n from a Pareto distribution (see Exer- 
cise 6.18). Then the methods of Section 6.7 imply that Ya) = min(Y;, Yo,..., Ү„) has the 
distribution function given by 
0, y SB, 


Е, = 
dM Me 


Use the method described in Exercise 9.26 to show that Yq) is a consistent estimator of В. 


458 Chapter9 Properties of Point Estimators and Methods of Estimation 


*9.29 


9.30 


9.31 


9.32 


9.33 


9.34 


9.35 


9.36 


Let Y;, Y2,..., Y, denote a random sample of size n from a power family distribution (see 
Exercise 6.17). Then the methods of Section 6.7 imply that Yi, = max(Yi;, Yo, ..., Ү„) has 
the distribution function given by 
0, y <0, 
Fuy(y) 2946/0)", 0<у <A, 
1, у> Ө. 


Use the method described in Exercise 9.26 to show that Y; is a consistent estimator of Ө. 


Let Yi; Y2,... Ү„ be independent random variables, each with probability density function 
3y, 0<у<1, 
ХО) = | | ў 
0, elsewhere. 


Show that Y converges in probability to some constant and find the constant. 


If Yi, Yo,..., Y, denote a random sample from a gamma distribution with parameters o and 
В, show that Y converges in probability to some constant and find the constant. 


Let Yj, Y», ..., Y, denote a random sample from the probability density function 
2 
— yz2 
РО) = yr" 
0, elsewhere. 


Does the law of large numbers apply to Y in this case? Why or why not? 


An experimenter wishes to compare the numbers of bacteria of types A and B in samples of 
water. A total of л independent water samples are taken, and counts are made for each sample. 
Let X; denote the number of type A bacteria and Y; denote the number of type B bacteria for 
sample i. Assume that the two bacteria types are sparsely distributed within a water sample so 
that Xi, X2,..., X, and Yi, Y2,..., Y, can be considered independent random samples from 
Poisson distributions with means A, and ^», respectively. Suggest an estimator of A; / (Ал + А). 
What properties does your estimator have? 


The Rayleigh density function is given by 


2y 2 
25 |2-у2/9 
ХО) = (2). » y>0, 


0, elsewhere. 


In Exercise 6.34(a), you established that Y? has an exponential distribution with mean Ө. 
If Yi, Yo,..., Y, denote a random sample from a Rayleigh distribution, show that W, = 
1 У) Y? is a consistent estimator for Ө. 


Let Yı, Y2,... be a sequence of random variables with E(Y;) = и and V(Y;) = 62. Notice 
that the o7"s are not all equal. 


a What is E(Y,)? 
What is V (Y,)? 
Under what condition (on the 02's) can Theorem 9.1 be applied to show that Y,isa 


consistent estimator for u? 


Suppose that Y has a binomial distribution based on п trials and success probability p. Then 
Pn = Y/n is an unbiased estimator of p. Use Theorem 9.3 to prove that the distribution of 


9.4 
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(Pn = р)/^/Ёһфһ/п converges to a standard normal distribution. [Hint: Write Y as we did in 
Section 7.5.] 


Sufficiency 


Up to this point, we have chosen estimators on the basis of intuition. Thus, we chose 
Y and S? as the estimators of the mean and variance, respectively, of the normal 
distribution. (It seems like these should be good estimators of the population parame- 
ters.) We have seen that it is sometimes desirable to use estimators that are unbiased. 
Indeed, Y and 52 have been shown to be unbiased estimators of the population mean 
и and variance o°, respectively. Notice that we have used the information in a sample 
of size n to calculate the value of two statistics that function as estimators for the pa- 
rameters of interest. At this stage, the actual sample values are no longer important; 
rather, we summarize the information in the sample that relates to the parameters of 
interest by using the statistics Y and 52. Has this process of summarizing or reducing 
the data to the two statistics, Y and 52, retained all the information about шапа o? 
in the original set of п sample observations? Or has some information about these 
parameters been lost or obscured through the process of reducing the data? In this 
section, we present methods for finding statistics that in a sense summarize all the 
information in a sample about a target parameter. Such statistics are said to have the 
property of sufficiency; or more simply, they are called sufficient statistics. As we will 
see in the next section, “good” estimators are (or can be made to be) functions of any 
sufficient statistic. Indeed, sufficient statistics often can be used to develop estimators 
that have the minimum variance among all unbiased estimators. 

То illustrate the notion of a sufficient statistic, let us consider the outcomes of л 
trials of a binomial experiment, X1, X2,..., Xn, where 


lo if the ith trial is a success, 
= 


O, if the ith trial is a failure. 


If p is the probability of success on any trial then, fori = 1, 2,..., n, 


xs [m with probability p, 
' 40, with probability q = 1 — p. 


Suppose that we are given a value of Y = » 7 , X;, the number of successes among 


the л trials. If we know the value of Y, can we gain any further information about р 


by looking at other functions of X1, X5,..., Xn? One way to answer this question 
is to look at the conditional distribution of X1, Хә, ..., Xn, given Y: 
Р(Х, РРР. = х, Y = у) 
P(X42X15::X4-—xJ1Y - = : 
(Xi 1 п nl y) P(Y — y) 


The numerator on the right side of this expression is 0 if У , x; # y, and it is the 
probability of an independent sequence of Os and 1s with a total of y 1s and (n — y) 
Os if У x; = y. Also, the denominator is the binomial probability of exactly y 
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DEFINITION 9.3 


DEFINITION 9.4 


successes in л trials. Therefore, if y = 0, 1, 2,..., и, 


р(1— р)" 1 wy 
Я - з= m if Xi = у, 
P(Xi2x,..., Xn = |У = у) = 4 Oa () 2 


0, otherwise. 
Itis important to note that the conditional distribution of X1, X2,..., Xn, given Y, 
does not depend upon p. Thatis, once Y is known, no other function of Х|, X2,..., Xn 


will shed additional light on the possible value of p. In this sense, Y contains all the 
information about p. Therefore, the statistic Y is said to be sufficient for p. We 
generalize this idea in the following definition. 


Let Yı, Yo,..., Y, denote a random sample from a probability distribution with 
unknown parameter 0. Then the statistic U = (ү, Yo, ..., Yn) is said to be 
sufficient for 0 if the conditional distribution of Y;, Y2,..., Y,, given U, does 


not depend on Ө. 


In many previous discussions, we have considered the probability function p(y) 
associated with a discrete random variable [or the density function f (y) for a contin- 
uous random variable] to be functions of the argument y only. Our future discussions 
will be simplified if we adopt notation that will permit us to explicitly display the 
fact that the distribution associated with a random variable Y often depends on the 
value of a parameter 0. If Y is a discrete random variable that has a probability mass 
function that depends on the value of a parameter 0, instead of p(y) we use the 
notation p(y | 0). Similarly, we will indicate the explicit dependence of the form of 
a continuous density function on the value of a parameter 0 by writing the density 
function as f (y | 0) instead of the previously used f(y). 

Definition 9.3 tells us how to check whether a statistic is sufficient, but it does 
not tell us how to find a sufficient statistic. Recall that in the discrete case the joint 
distribution of discrete random variables Y1, Yo, ..., Y, is given by a probability 
function p(yi, yo. ..., Yn). If this joint probability function depends explicitly on 
the value of a parameter 0, we write it as p(yi, yo, ..., Yn | Ө). This function gives 
the probability or likelihood of observing the event (Y; = yj, Yo = y», ..., Yn = Yn) 
when the value of the parameter is 0. In the continuous case when the joint distribution 
of Yi, Y2,..., Y, depends on a parameter 0, we will write the joint density function as 
fi, y2, ..., Yn | 9). Henceforth, it will be convenient to have a single name for the 
function that defines the joint distribution of the variables Y1, Yo, ..., Y, observed 
in a sample. 


Let yi, yo, ..., y, be sample observations taken on corresponding random 
variables Y1, Y2,..., Y, whose distribution depends on a parameter 0. Then, 
if Yi, Yo,..., Y, are discrete random variables, the likelihood of the sample, 
L(yi, уо, ..., Yn 10), is defined to be the joint probability of y1, у»,..., Yn- 


ТНЕОКЕМ 9.4 


EXAMPLE 9.5 


Solution 
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IfY,, Yo,..., Y, are continuous random variables, the likelihood L (y1, y». ... 
Yn | 0) is defined to be the joint density evaluated at y1, yo, ..., Ул. 


If the set of random variables Y;, Y2,..., Y, denotes a random sample from a 
discrete distribution with probability function p(y | Ө), then 
Li, yz. Уп 10) = р(у, у, ..., Yn 10) 
= p(y 10) х р(у2 |0) х···х р(у, |9), 
whereas if Y;, Yo,..., Y, have a continuous distribution with density function 
/(у |0), then 
Ln у, ..., Yn 10) = РО, У, ..., Yn 10) 
= (7110) x 0210) x +++ x / (ул 1). 


To simplify notation, we will sometimes denote the likelihood Бу L(@) instead of by 


L(yi, У2,..., Yn | 0). 
The following theorem relates the property of sufficiency to the likelihood L(0). 


Let О be a statistic based on the random sample Y;, Yo,..., Y,. Then U isa 
sufficient statistic for the estimation of a parameter 0 if and only if the likelihood 
L(0) = І(уџ, yo, ..., Yn | Ө) can be factored into two nonnegative functions, 


Бур, Sas 0005 уа 192) = 00020) Ais у... Mal) 


where g(u,@) is a function only of и and 0 and A(yi, yo, ..., уп) is not a 
function of 0. 


Although the proof of Theorem 9.4 (also known as the factorization criterion) 
is beyond the scope of this book, we illustrate the usefulness of the theorem in the 
following example. 


Let Yi, Y2,..., Y, be a random sample in which Y; possesses the probability density 
function 
(1/80)e-*/9. 0< y; < co, 
FOr 18) = | / | 
: elsewhere, 
where 6 > 0,i = 1,2,..., n. Show that Y is a sufficient statistic for the parameter 0. 


The likelihood L(@) of the sample is the joint density 


L(Cyi, M eia Wl = fOr, Jig sees yn |0) 
fOO x fo210) x +++ х fn 10) 


e "»/0 e78 eg ^|  g-M»w/0  g-ny[e 
— х XX = — 
Ө Ө Ө Ө" Ө" 
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9.37 


9.38 


Notice that L(@) is a function only of 0 and y and that if 


—ny/0 
gly, 0) = jn and Ah(yi ys... Уһ) = 1, 
then 
L1, y. Уа 10) = 80,0) x А0, yo... Yn). 
Hence, Theorem 9.4 implies that Y is a sufficient statistic for the parameter 0. Ё 


Theorem 9.4 can be used to show that there are many possible sufficient statistics 
for any one population parameter. First of all, according to Definition 9.3 or the 
factorization criterion (Theorem 9.4), the random sample itself is a sufficient statistic. 
Second, if Yi, Y2,..., Y, denote a random sample from a distribution with a density 
function with parameter 0, then the set of order statistics Ya) < Yo) < ++: < Yo, 
which is a function of Y1, Yo, ..., Yn, is sufficient for Ө. In Example 9.5, we decided 
that Y is a sufficient statistic for the estimation of 0. Theorem 9.4 could also have been 
used to show that » 7 , Y; is another sufficient statistic. Indeed, for the exponential 
distribution described in Example 9.5, any statistic that is a one—to—one function of 
Y is a sufficient statistic. 

In our initial example of this section, involving the number of successes in 7 tri- 
als, Y — adm Xj reduces the data X1, X5, ..., X, to a single value that remains 
sufficient for p. Generally, we would like to find a sufficient statistic that reduces the 
data in the sample as much as possible. Although many statistics are sufficient for the 
parameter 0 associated with a specific distribution, application of the factorization 
criterion typically leads to a statistic that provides the “best” summary of the infor- 
mation in the data. In Example 9.5, this statistic is Y (or some one-to-one function of 
it). In the next section, we show how these sufficient statistics can be used to develop 
unbiased estimators with minimum variance. 


Exercises 


Let Х|, X2,..., X, denote n independent and identically distributed Bernoulli random vari- 
ables such that 


P(X;=1)=p and Р(Х,=0)=1—р, 


for each i = 1,2,...,n. Show that ? 7 , X; is sufficient for p by using the factorization 
criterion given in Theorem 9.4. 

Let Y; , Y?,..., Y, denote a random sample from a normal distribution with mean ш and 
variance o?. 


a If u is unknown and c? is known, show that Y is sufficient for и. 
If u is known and c? is unknown, show that У) (У; — u}? is sufficient for o°. 
If u and c? are both unknown, show that 5 7 Y; and У , Y? are jointly sufficient for и 
and o°. [Thus, it follows that Y and У" , (Y; — Y)? or Y and S? are also jointly sufficient 
for u and c?.] 


9.39 


9.40 


9.41 


9.42 


9.43 


9.44 


9.45 


9.46 


9.47 


9.48 


*9.49 


*9.50 


*9.51 
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Let Y;, Y2,..., Y, denote a random sample from a Poisson distribution with parameter A. 
Show by conditioning that 5 7 Y; is sufficient for А. 


Let Yı, Y», ..., Y, denote a random sample from a Rayleigh distribution with parameter Ө. 
(Refer to Exercise 9.34.) Show that У" Y? is sufficient for Ө. 


Let Yı, Y2,..., Y, denote a random sample from a Weibull distribution with known m and 
unknown о. (Refer to Exercise 6.26.) Show that 9 7 У," is sufficient for о. 


If Yi, Y2,..., Y, denote a random sample from a geometric distribution with parameter р, 
show that Y is sufficient for p. 


Let Y), Yo... Y, denote independent and identically distributed random variables from a 
power family distribution with parameters o and 0. Then, by the result in Exercise 6.17, if 
0,0 > 0, 
«—1 a 
ay^ /0^. O<y <8, 
РО 10, 0) = | 

: elsewhere. 
If 0 is known, show that [ [;7.., Y; is sufficient for a. 
Let Y;, Y2,..., Y, denote independent and identically distributed random variables from a 
Pareto distribution with parameters o and В. Then, by the result in Exercise 6.18, if o, В > 0, 


оВ®у-@+0, yz, 
0, elsewhere. 


fols. | 


If В is known, show that [ [;.., Y; is sufficient for o. 
Suppose that Y;, Y2,..., Y, is a random sample from a probability density function in the 
(one-parameter) exponential family so that 
бу = oo a<ysb, 
0 elsewhere, 


fol 


where a and b do not depend on Ө. Show that У? d(Y;) is sufficient for Ө. 


If Yi, Y?, ..., Y, denote a random sample from an exponential distribution with mean В, show 
that f (y | B) is in the exponential family and that Y is sufficient for £. 


Refer to Exercise 9.43. If 0 is known, show that the power family of distributions is in the 
exponential family. What is a sufficient statistic for œ? Does this contradict your answer to 
Exercise 9.43? 


Refer to Exercise 9.44. If B is known, show that the Pareto distribution is in the exponential 
family. What is a sufficient statistic for œ? Argue that there is no contradiction between your 
answer to this exercise and the answer you found in Exercise 9.44. 


Let Y;, Y2,..., Y, denote a random sample from the uniform distribution over the interval 
(0, Ө). Show that Y; = max(Y;, Y», ..., Ү„) is sufficient for Ө. 
Let Y;, Y2,..., Y, denote a random sample from the uniform distribution over the interval 
(01, 62). Show that Yi = min(Y;, Yo,..., Yn) and Yin) = max(Yi, Yo,..., Ү„) are jointly 
sufficient for 0; and 02. 
Let Y;, Y2,..., Y, denote a random sample from the probability density function 

f (16) | А 

ДОУ = 

0, elsewhere. 


Show that Ya) = min(Yi, Yo,..., Ү,) is sufficient for 0. 
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*9.52 


*9.53 


*9.54 


*9.55 


9.5 


ТНЕОКЕМ 9.5 


Proof 


Let Y;, Y2,..., Y, be a random sample from a population with density function 
3y? 
ay 0 < x 0, 
fol)eie' 7*7 
0, elsewhere. 


Show that Yan) = шах(У|, Y2,..., Y,) is sufficient for Ө. 


Let Y;, Y2,..., Y, be a random sample from a population with density function 
20? 0 
—, < y<, 
FOID = | уз 
0, elsewhere. 


Show that Ү = min(Y;, Y2,..., Y,) is sufficient for 0. 


Let Yi, №, ..., Y, denote independent and identically distributed random variables from a 
power family distribution with parameters œ and 0. Then, as in Exercise 9.43, if a, 0 > 0, 


0—1 a 
ay" /0*, O<y<9, 
ХО 10, 0) = | 
Я elsewhere. 
Show that max(Y;, Yo, ..., Y,) and [ [7 , Y; are jointly sufficient for œ and Ө. 
Let Y;, Y2,..., Y, denote independent and identically distributed random variables from a 


Pareto distribution with parameters o and В. Then, as in Exercise 9.44, if o, В > 0, 


о у> В, 
elsewhere. 


f(y |e, В) = 


` 


Show that [ [;_; Y; and min(Y;, Y», ..., У,) are jointly sufficient for œ and £. 


The Rao-Blackwell Theorem and 
Minimum-Variance Unbiased Estimation 


Sufficient statistics play an important role in finding good estimators for parameters. If 
Ó is an unbiased estimator for 0 and if U is a statistic that is sufficient for 0, then there 
is a function of U that is also an unbiased estimator for 0 and has no larger variance 
than 6. If we seek unbiased estimators with small variances, we can restrict our search 
to estimators that are functions of sufficient statistics. The theoretical basis for the 
preceding remarks is provided in the following result, known as the Rao—Blackwell 
theorem. 


The Rao-Blackwell Theorem Let 6 be an unbiased estimator for 0 such that 
V(0) < оо. If U is a sufficient statistic for 0, define 0* = E(0 | U). Then, for 
all Ө, 


E(0*) 20 and V(0*) < vô). 
Because U is sufficient for 0, the conditional distribution of any statistic 


(including 6), given U, does not depend on 0. Thus, ĝ* = E (6 |U) is not a 
function of Ө and is therefore a statistic. 
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Recall Theorems 5.14 and 5.15 where we considered how to find means 
and variances of random variables by using conditional means and variances. 
Because Ó is an unbiased estimator for 0, Theorem 5.14 implies that 


E(0*) = E[E(Ó | U] = E(Ó) = Ө. 


Thus, 0* is an unbiased estimator for 0. 
Theorem 5.15 implies that 


V(6) = VIE Ô| U)] + E[V(Ó | U)] 
= V(0*) + E[V Ê | U)]. 


Because V (Ô| U = u) > 0 for all u, it follows that E[V (Ô| U)] > 0 and 
therefore that V (д) > V (0*), as claimed. 


Theorem 9.5 implies that an unbiased estimator for 0 with a small variance is or 
can be made to be a function of a sufficient statistic. If we have an unbiased estimator 
for 0, we might be able to improve it by using the result in Theorem 9.5. It might 
initially seem that the Rao-Blackwell theorem could be applied once to get a better 
unbiased estimator and then reapplied to the resulting new estimator to get an even 
better unbiased estimator. If we apply the Rao- Blackwell theorem using the sufficient 
statistic U, then 0* — E (8 | U) will be a function of the statistic U , say, ó*— h(U). 
Suppose that we reapply the Rao—Blackwell theorem to Ó* by using the same sufficient 
statistic U. Since, in general, E(h(U) | U) = h(U), we see that by using the Rao- 
Blackwell theorem again, our “new” estimator is just h(U) = Ó*. That is, if we use 
the same sufficient statistic in successive applications of the Rao-Blackwell theorem, 
we gain nothing after the first application. The only way that successive applications 
can lead to better unbiased estimators is if we use a different sufficient statistic when 
the theorem is reapplied. Thus, it is unnecessary to use the Rao-Blackwell theorem 
successively if we use the right sufficient statistic in our initial application. 

Because many statistics are sufficient for a parameter 0 associated with a distri- 
bution, which sufficient statistic should we use when we apply this theorem? For the 
distributions that we discuss in this text, the factorization criterion typically identifies 
a statistic U that best summarizes the information in the data about the parame- 
ter Ө. Such statistics are called minimal sufficient statistics. Exercise 9.66 introduces 
a method for determining a minimal sufficient statistic that might be of interest to 
some readers. In a few of the subsequent exercises, you will see that this method 
usually yields the same sufficient statistics as those obtained from the factorization 
criterion. In the cases that we consider, these statistics possess another property (com- 
pleteness) that guarantees that, if we apply Theorem 9.5 using U, we not only get 
an estimator with a smaller variance but also actually obtain an unbiased estimator 
for 0 with minimum variance. Such an estimator is called a minimum-variance unbi- 
ased estimator (MV UE). See Casella and Berger (2002), Hogg, Craig, and McKean 
(2005), or Mood, Graybill, and Boes (1974) for additional details. 

Thus, if we start with an unbiased estimator for a parameter 0 and the sufficient 
statistic obtained through the factorization criterion, application of the Rao-Blackwell 
theorem typically leads to an MVUE for the parameter. Direct computation of 
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conditional expectations can be difficult. However, if U is the sufficient statistic 
that best summarizes the data and some function of U— say, h(U)—can be found 
such that E[h(U)] = 0, it follows that h(U) is the MVUE for Ө. We illustrate this 
approach with several examples. 


EXAMPLE 9.6 


Solution 


Let Y;, У, ..., Y, denote a random sample from a distribution where P (Y; = 1) = p 
and P(Y; 2 0) 2 1 — p, with p unknown (such random variables are often called 
Bernoulli variables). Use the factorization criterion to find a sufficient statistic that 
best summarizes the data. Give an MVUE for p. 


Notice that the preceding probability function can be written as 
PY; = y) = p* (01 — p)", у = 0,1. 
Thus, the likelihood Г (р) is 
L1, yn s Уп | P) = PO yn. Ул]р) 
= pl = py хр р)! ери py 
FS hi. yas) 


According to the factorization criterion, U = У? Y; is sufficient for p. This statistic 
best summarizes the information about the parameter p. Notice that E(U) — np, or 
equivalently, E(U/n) — p. Thus, U/n — Y is an unbiased estimator for p. Because 
this estimator is a function of the sufficient statistic У", Y;, the estimator р = Y is 
the MVUE for p. EH 


EXAMPLE 9.7 


Solution 


Suppose that Y;, Y», .. 


tion, given by р 
2у —у2/0 
a rs 0, 
fole) = (2). Ps 


0, elsewhere. 


., Y, denote a random sample from the Weibull density func- 


Find an MVUE for Ө. 


We begin by using the factorization criterion to find the sufficient statistic that best 
summarizes the information about Ө. 


L(y1, 2, ..., yn10) = fv у, -- -> Yn 10) 


2 п 1 n 
-($) бозо (=F) 
2 n 1 п 
= (5) о (— 297) x ххх. 


В(у\,уз....,Уп) 


(© ye) 
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Thus, U = Y 7 , Y? is the minimal sufficient statistic for Ө. 
We now must find a function of this statistic that is unbiased for 0. Letting W = ҮЗ, 
we have 


fw) = fm) = (5) (we!) (z=) Е (5) er у> 0, 


That is, y? has an exponential distribution with parameter 0. Because 


і=1 


E(Y2) 2 E(W) =0 and (ўи) =m, 


it follows that 


is an unbiased estimator of Ө that is a function of the sufficient statistic У)” Ү2. 


Therefore, 0 is an MVUE of the Weibull parameter Ө. 


The following example illustrates the use of this technique for estimating two 
unknown parameters. 


EXAMPLE 9.8 Suppose Yi, Y2,..., Y, denotes a random sample from a normal distribution with 
unknown mean ш and variance о2. Find the MVUEs for u and o°. 


Solution Again, looking at the likelihood function, we have 


Гу уз,..., Yn | U, 67) 
= f (yi, Y2 -++s Yall, О?) 


1 n 1 n 
= { —— ] вр у (guy 
(G =) 1 202 2, i | 
оу 27 20? \{—{7' i=l 
1 n —пи? 1 Е 2 п 
EN е 2_2 ‚||. 
(у=) (GE) е ae 8] 


Thus, $7 Y; and У)" Y?, jointly, are sufficient statistics for jz and o°. 
We know from past work that Y is unbiased for и and 


1 п ИЗ 1 п EC 
2 у; IAS ode у; 2 _ 
s ee i a E K=O pi т 


is unbiased for o?. Because these estimators are functions of the statistics that best 
summarize the information about jz and o?, they are MVUEs for и and o°. El 
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The factorization criterion, together with the Rao—Blackwell theorem, can also be 
used to find MVUES for functions of the parameters associated with a distribution. 
We illustrate the technique in the following example. 


EXAMPLE 9.9 


Solution 


Let Yı, Yo, ..., Y, denote a random sample from the exponential density function 
given by 
1 —y/8 
folo = (5) p 
0, elsewhere. 
Find an MVUE of V (Y;). 


In Chapter 4, we determined that E(Y;) — 0 and that V(Y;) — 0?. The factorization 
criterion implies that 5 7 , Y; is the best sufficient statistic for Ө. In fact, Y is the 


MVUE of Ө. Therefore, it is tempting to use Y? as an estimator of 62. But 
= = 5 8 1 
Е (7°) =V) +EP = +02 = (=) 02. 
п п 
It follows that y? is a biased estimate for 0?. However, 


( n )r 
п+ 1 


is an MVUE of 0? because it is an unbiased estimator for 0? and a function of ће 
sufficient statistic. No other unbiased estimator of 02 will have a smaller variance 
than this one. E 


A sufficient statistic for a parameter 0 often can be used to construct an exact 
confidence interval for 0 if the probability distribution of the statistic can be found. 
The resulting intervals generally are the shortest that can be found with a specified 
confidence coefficient. We illustrate the technique with an example involving the 
Weibull distribution. 


EXAMPLE 9.10 


The following data, with measurements in hundreds of hours, represent the lengths 
of life of ten identical electronic components operating in a guidance control system 
for missiles: 


.637 1.531 .733 2.256 2.364 
1.601 .152 1.826 1.868 1.126 


The length of life of a component of this type is assumed to follow a Weibull distri- 
bution with density function given by 


2y —у2/0 
— иа > 0, 
fole- (2). 4 
0, elsewhere. 
Use the data to construct a 95% confidence interval for Ө. 


Solution 


9.5 The Rao-Blackwell Theorem and Minimum-Variance Unbiased Estimation 469 


We saw in Example 9.7 that the sufficient statistic that best summarizes the information 
about 6 is? 7 у Y. We will use this statistic to form a pivotal quantity for constructing 
the desired confidence interval. 

Recall from Example 9.7 that W; = ү has an exponential distribution with mean Ө. 
Now consider the transformation Т; = 2W;/0. Then 


= Өг\ү 4001/2) (1ү ga (9\ (lY о 
пол (9) d = (5) 5 = 2 е , і > 0. 


Thus, foreach i = 1, 2,...,n, Т; hasa x? distribution with 2 df. Further, because the 
variables Y; are independent, the variables 7; are independent, for i = 1,2,...,n. 
The sum of independent x? random variables has a x? distribution with degrees of 
freedom equal to the sum of the degrees of freedom of the variables in the sum. 
Therefore, the quantity 


is a pivotal quantity, and we can use the pivotal method (Section 8.5) to construct the 
desired confidence interval. 
From Table 6, Appendix 3, we can find two numbers a and b such that 


2 10 А 
Plas.» арры 
252), = 


Manipulating the inequality to isolate Ө in the middle, we have 


From Table 6, Appendix 3, the value that cuts off an area of .025 in the lower tail 
of the x? distribution with 20 df is a = 9.591. The value that cuts off an area of 
.025 in the upper tail of the same distribution is Б = 34.170. For the preceding data, 
X 2 1 у? = 24.643. Therefore, the 95% confidence interval for the Weibull parameter 
0 is 


(1.442, 5.139). 


2(24.643) 2(24.643) 
34.170 ' 9.591 | 


This is a fairly wide interval for Ө, but it is based on only ten observations. E] 


In this section, we have seen that the Rao—Blackwell theorem implies that unbi- 
ased estimators with small variances are functions of sufficient statistics. Generally 
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9.56 
9.57 
9.58 
9.59 


9.60 


9.61 
9.62 
9.63 


speaking, the factorization criterion presented in Section 9.4 can be applied to find 
sufficient statistics that best summarize the information contained in sample data 
about parameters of interest. For the distributions that we consider in this text, an 
MVUE for a target parameter 0 can be found as follows. First, determine the best 
sufficient statistic, U. Then, find a function of U, h(U), such that E[h(U)] = Ө. 

This method often works well. However, sometimes a best sufficient statistic 15 
a fairly complicated function of the observable random variables in the sample. In 
cases like these, it may be difficult to find a function of the sufficient statistic that 
is an unbiased estimator for the target parameter. For this reason, two additional 
methods of finding estimators—the method of moments and the method of maximum 
likelihood—are presented in the next two sections. A third important method for 
estimation, the method of least squares, is the topic of Chapter 11. 


Exercises 


Refer to Exercise 9.38(b). Find an MVUE of o?. 
Refer to Exercise 9.18. Is the estimator of o? given there ап МУСЕ of o?? 
Refer to Exercise 9.40. Use У, Y? to find an MVUE of 6. 


The number of breakdowns Y per day for a certain machine is a Poisson random variable with 
mean A. The daily cost of repairing these breakdowns is given by С = 3Y?. If Yj, №,..., Y, 
denote the observed number of breakdowns for n independently selected days, find an MVUE 
for E(C). 


Let Yj, Y», ..., Y, denote a random sample from the probability density function 
0y?-, 0<y<1, 06-0, 
fole) = | 
0, elsewhere. 


a Show that this density function is in the (one-parameter) exponential family and that 
a — Ш(У;) is sufficient for Ө. (See Exercise 9.45.) 


If W; = — In(Y;), show that W; has an exponential distribution with mean 1/0. 


Use methods similar to those in Example 9.10 to show that 20 У”, W; has a x? distribution 
with 2n df. 


d Show that 


1 1 
Е = : 
(z Улы x) 2(n — 1) 
[Hint: Recall Exercise 4.112.] 
e What is the MVUE for 0? 


Refer to Exercise 9.49. Use Yq) to find an MVUE of 0. (See Example 9.1.) 
Refer to Exercise 9.51. Find a function of Yq) that is an MVUE for Ө. 


Let Y;, Y2, ..., Y, be a random sample from a population with density function 
3y? 0 <у<0 
010) = } 92° 2727 


0, elsewhere. 
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In Exercise 9.52 you showed that Ya) = max(Y,, Y2,..., Ү„) is sufficient for Ө. 


a Show that Ym) has probability density function 


3 ny! 
010) =f әв, OSV SP 
0, elsewhere. 
b Find ће МУСЕ of 0. 
Let Yi, Yo,..., Y, be a random sample from a normal distribution with mean jz and variance 1. 


a Show that the MVUE of u? is (2 = Y^ — 1/n. 
b Derive the variance of и?. 
In this exercise, we illustrate the direct use of the Rao-Blackwell theorem. Let Y;, У, ..., Y, 
be independent Bernoulli random variables with 
p(ilp)=p™(1—p)™, у=0,1. 
That is, P(Y; = 1) = p and P(Y; = 0) = 1 — p. Find the MVUE of p(1 — p), which is a 


n 


term in the variance of Y; or W = У" , Y;, by the following steps. 


i 


a Let 
5 1, ifY,; = l and Y; = 0, 
= 0, otherwise. 
Show that Е(Т) = p(1 — p). 
b Show that 


w(n— w) 
n(n—1)' 


РЕТ =1|W=w)= 


c Show that 


grim = [T ")|= AE NES 


n n n—i 
and hence that nY (1 — Y)/(n — 1) is the MVUE of р(1— p). 


The likelihood function (уу, y2,..., y,|0) takes on different values depending on the 
arguments (yj, y», ..., Yn). A method for deriving a minimal sufficient statistic developed by 
Lehmann and Scheffé uses the ratio of the likelihoods evaluated at two points, (xi, xo, ..., Xn) 
and (yi, yo. ..., Yn): 


L(x, XQ, 2005 Xa l0) 
LO, yo. X410) 


Many times it is possible to find a function g(x;, x2, ..., x,) such that this ratio is free of the 
unknown parameter 0 if and only if g(xi, xo, ..., Xn) = g(yi, у, ..., Yn). If such a function 
g can be found, then g(Yi, Y», ..., Yn) is a minimal sufficient statistic for Ө. 


a Let Y;, Y2,..., Y, be a random sample from a Bernoulli distribution (see Example 9.6 
and Exercise 9.65) with p unknown. 


i Show that 


(ху, Хз,..., Au EBD. ( p ) 
L(yi, yz... Yn| p) 1-p 
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ii Argue that for this ratio to be independent of p, we must have 


n n n n 
у Xj — у у = 0 ог у Xj = у Yi. 
i=1 i=l i-l i=l 


iii Using the method of Lehmann and Scheffé, what is a minimal sufficient statistic for 
p? How does this sufficient statistic compare to the sufficient statistic derived in Ex- 
ample 9.6 by using the factorization criterion? 


b Consider the Weibull density discussed in Example 9.7. 
i Show that 


L(xi, X2, ...,x4,]0) X1X2 71 Xn 1 : : 
- se (УУ). 
i=l i=l 


L(yi ya, WIO — 3YJ2*** Yn 


ii Argue that 3 7 , Y? is a minimal sufficient statistic for Ө. 


*9.67 Refer to Exercise 9.66. Suppose that a sample of size n is taken from a normal population 


with mean и and variance c?. Show that $ 7 , Y;, and У" Y? jointly form minimal sufficient 


statistics for ш and o°. 


*9.68 Suppose that a statistic U has a probability density function that is positive over the interval 


a < и < b and suppose that the density depends on a parameter Ө that can range over the 
interval a} < 0 < o». Suppose also that g(u) is continuous for и in the interval [a, b]. If 
E[g(U) 10] = 0 for all 0 in the interval [е |, a2] implies that о (и) is identically zero, then the 
family of density functions { fy (и | Ө), oj < 0 < о} is said to be complete. (All statistics that 
we employed in Section 9.5 have complete families of density functions.) Suppose that U is a 
sufficient statistic for 0, and gı (U) апа g5(U) are both unbiased estimators of 0. Show that, if 
the family of density functions for U is complete, g; (U) must equal g>(U), and thus there is 
a unique function of U that is an unbiased estimator of 0. 

Coupled with the Rao-Blackwell theorem, the property of completeness of fy(u|0), 
along with the sufficiency of U, assures us that there is a unique minimum-variance unbiased 
estimator (UMVUE) of 0. 


9.6 The Method of Moments 


In this section, we will discuss one of the oldest methods for deriving point estimators: 
the method of moments. A more sophisticated method, the method of maximum 
likelihood, is the topic of Section 9.7. 

The method of moments is a very simple procedure for finding an estimator for 
one or more population parameters. Recall that the kth moment of a random variable, 
taken about the origin, is 


p, = Е(У?). 


The corresponding kth sample moment is the average 


The method of moments is based on the intuitively appealing idea that sample mo- 
ments should provide good estimates of the corresponding population moments. 
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That is, т, should be a good estimator of ш, for k = 1,2,.... Then because 
the population moments 44, (45, . . . , м are functions of the population parameters, 
we can equate corresponding population and sample moments and solve for the 
desired estimators. Hence, the method of moments can be stated as follows. 


Method of Moments 

Choose as estimates those values of the parameters that are solutions of the 
equations u, = ту, fork = 1,2,...,t, where t is the number of parameters 
to be estimated. 


EXAMPLE 9.11 


Solution 


A random sample of n observations, Y1, Y2,..., Yn, is selected from a population in 
which Y;, for i = 1,2, ...,n, possesses a uniform probability density function over 
the interval (0, 9) where Ө is unknown. Use the method of moments to estimate the 
parameter 0. 


The value of jz, for a uniform random variable is 
; 0 
“= и = 2 
The corresponding first sample moment is 


= ue? 


Equating the corresponding population and sample moment, we obtain 


ta Ө — Y 

“| ES 2 umi Ы 
The method-of-moments estimator for 0 is the solution of the above equation. That 
is, = 2Y. [1 


For the distributions that we consider in this text, the methods of Section 9.3 can 
be used to show that sample moments are consistent estimators of the corresponding 
population moments. Because the estimators obtained from the method of moments 
obviously are functions of the sample moments, estimators obtained using the method 
of moments are usually consistent estimators of their respective parameters. 


EXAMPLE 9.12 


Solution 


Show that the estimator Ô = 2Y, derived in Example 9.11, is a consistent estimator 
for Ө. 


In Example 9.1, we showed that д = 2Y is an unbiased estimator for 0 and that 
V(6) = 0?/3n. Because lim, ,4, V (д) = 0, Theorem 9.1 implies that Ó = 2Y is a 
consistent estimator for Ө. 0 
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Although the estimator Ô derived in Example 9.11 is consistent, it is not nec- 
essarily the best estimator for Ө. Indeed, the factorization criterion yields Y, = 
max(Y;, Yo, ..., Ү„) to be the best sufficient statistic for Ө. Thus, according to the 
Rao-Blackwell theorem, the method-of-moments estimator will have larger variance 
than an unbiased estimator based on Yn). This, in fact, was shown to be the case in 
Example 9.1. 


EXAMPLE 9.13 


Solution 


A random sample of n observations, Yi, Y2,..., Yn, is selected from a population 
where Y;, for i = 1,2,...,n, possesses a gamma probability density function with 
parameters o and f (see Section 4.6 for the gamma probability density function). 
Find method-of-moments estimators for the unknown parameters o and В. 


Because we seek estimators for two parameters œ and f, we must equate two pairs 
of population and sample moments. 

The first two moments of the gamma distribution with parameters o and В are (see 
the inside of the back cover of the text, if necessary) 


Шу =un=Ħa&ß ad u= gu = of? + o? В?. 


Now equate these quantities to their corresponding sample moments and solve for & 
and f. Thus, 


From the first equation, we obtain В = Y /á. Substituting into the second equation 
and solving for à, we obtain 
72 Vv 


А Ү nY 
a= = 


(Emr Eam 
Substituting @ into the first equation, we obtain 
Y Xn 
@ nY | 


p= 


The method-of-moments estimators & and д in Example 9.13 are consistent. Y 
converges in probability to E(Y;) = of, and (1/n) $5; 4 y converges in probability 
to E(Y2) = of? + a? B?. Thus, 


А Ү | | | (ав)? 
й = ————————— is aconsistent estimator of 5 252 7 = 0, 
БҮ Шт ор? + 7B — (ор) 
апа 
z Y ; ap 
В = — isa consistent estimator of — = f. 
â a 
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Using the factorization criterion, we can show У, Y; and the product Į [}_; Y; to be 
sufficient statistics for the gamma density function. Because the method-of-moments 
estimators @ and f are not functions of these sufficient statistics, we can find more 
efficient estimators for the parameters о and В. However, it is considerably more 
difficult to apply other methods to find estimators for these parameters. 

To summarize, the method of moments finds estimators of unknown parameters 
by equating corresponding sample and population moments. The method is easy to 
employ and provides consistent estimators. However, the estimators derived by this 
method are often not functions of sufficient statistics. As a result, method-of-moments 
estimators are sometimes not very efficient. In many cases, the method-of-moments 
estimators are biased. The primary virtues of this method are its ease of use and that 
it sometimes yields estimators with reasonable properties. 


Exercises 
Let Y;, Y2,..., Y, denote a random sample from the probability density function 
(04-)y$, O<y<1;0>-1, 
fole) = | 
А elsewhere. 


Find an estimator for 6 by the method of moments. Show that the estimator is consistent. Is 
the estimator a function of the sufficient statistic — У)”, In(Y;) that we can obtain from the 
factorization criterion? What implications does this have? 


Suppose that Yı, Y2,..., Y, constitute a random sample from a Poisson distribution with 
mean A. Find the method-of-moments estimator of A. 


If Yi, Y2,..., Y, denote a random sample from the normal distribution with known mean 
ш = 0 and unknown variance o^, find the method-of-moments estimator of o°. 


If Yi, Y2,..., Y, denote a random sample from the normal distribution with mean jz and 
variance o”, find the method-of-moments estimators of ш and o°. 


An urn contains 0 black balls and N — 0 white balls. A sample of n balls is to be selected 
without replacement. Let Y denote the number of black balls in the sample. Show that (N/n)Y 
is the method-of-moments estimator of 0. 


Let Yi, Y2,..., Y, constitute a random sample from the probability density function given by 
= (Ө ), О<у<ф@ 
ҒОТӨ) = \\ в? а 
0, elsewhere. 


a Find an estimator for 0 by using the method of moments. 


b Isthis estimator a sufficient statistic for 0? 


Let Yı, Y2,..., Y, be a random sample from the probability density function given by 
Г (20 
: Л (уб) – у), 0<у<1, 
ХОТ10) = 4 Ir(6)] 
0, elsewhere. 


Find the method-of-moments estimator for 0. 
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Let Х|, X5, X5, ... be independent Bernoulli random variables such that P(X; = 1) = рапа 
P(X; = 0) = 1 — р for each i = 1, 2,3, ....Let the random variable Y denote the number of 
trials necessary to obtain the first success—that is, the value of i for which X; = 1 first occurs. 
Then Y has a geometric distribution with P(Y = у) = (1 — p)"^! p, for y = 1, 2,3,.... Find 
the method-of-moments estimator of p based on this single observation У. 


Let Y,, Y?,..., Y, denote independent and identically distributed uniform random variables 
on the interval (0, 30). Derive the method-of-moments estimator for 0. 


Let Y;, Y2,..., Y, denote independent and identically distributed random variables from a 


power family distribution with parameters o and 0 = 3. Then, as in Exercise 9.43, if a > 0, 
res 0s y <3, 
; elsewhere. 


flo) = 


Show that E(Y,) = 3o/(a + 1) and derive the method-of-moments estimator for o. 


Let Yi, Yo», ..., Y, denote independent and identically distributed random variables from a 
Pareto distribution with parameters o and В, where В is known. Then, if o > 0, 


apy 6+), yz, 
fle, В) = | á i 
Р elsewhere. 
Show that E(Y;) = aB/(a — 1) if a > 1 and E(Y;) is undefined if 0 < a < 1. Thus, the 


method-of-moments estimator for œ is undefined. 


The Method of Maximum Likelihood 


In Section 9.5, we presented a method for deriving an MVUE for a target parame- 
ter: using the factorization criterion together with the Rao—Blackwell theorem. The 
method requires that we find some function of a minimal sufficient statistic that is an 
unbiased estimator for the target parameter. Although we have a method for finding a 
sufficient statistic, the determination of the function of the minimal sufficient statistic 
that gives us an unbiased estimator can be largely a matter of hit or miss. Section 
9.6 contained a discussion of the method of moments. The method of moments is 
intuitive and easy to apply but does not usually lead to the best estimators. In this 
section, we present the method of maximum likelihood that often leads to MVUEs. 

We use an example to illustrate the logic upon which the method of maximum 
likelihood is based. Suppose that we are confronted with a box that contains three 
balls. We know that each of the balls may be red or white, but we do not know the 
total number of either color. However, we are allowed to randomly sample two of 
the balls without replacement. If our random sample yields two red balls, what would 
be a good estimate of the total number of red balls in the box? Obviously, the number 
of red balls in the box must be two or three (if there were zero or one red ball in the box, 
it would be impossible to obtain two red balls when sampling without replacement). 
If there are two red balls and one white ball in the box, the probability of randomly 
selecting two red balls is 
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On the other hand, if there are three red balls in the box, the probability of randomly 
selecting two red balls is 


It should seem reasonable to choose three as the estimate of the number of red balls 
in the box because this estimate maximizes the probability of obtaining the observed 
sample. Of course, it is possible for the box to contain only two red balls, but the 
observed outcome gives more credence to there being three red balls in the box. 

This example illustrates a method for finding an estimator that can be applied to 
any situation. The technique, called the method of maximum likelihood, selects as 
estimates the values of the parameters that maximize the likelihood (the joint proba- 
bility function or joint density function) of the observed sample (see Definition 9.4). 
Recall that we referred to this method of estimation in Chapter 3 where in Exam- 
ples 3.10 and 3.13 and Exercise 3.101 we found the maximum-likelihood estimates 
of the parameter p based on single observations on binomial, geometric, and negative 
binomial random variables, respectively. 


Method of Maximum Likelihood 


Suppose that the likelihood function depends on k parameters 01, 05, ..., Өр. 
Choose as estimates those values of the parameters that maximize the likelihood 
LO, 3255.55 Уп | 061, 02, eem „б 


To emphasize the fact that the likelihood function is a function of the parameters 
01,05, ...,0,, we sometimes write the likelihood function as L(01, 6›,..., 0р). It 
is common to refer to maximum-likelihood estimators as MLEs. We illustrate the 
method with an example. 


EXAMPLE 9.14 


Solution 


A binomial experiment consisting of n trials resulted in observations у, yo. .... Yn, 
where у; = 1 if the ith trial was a success and у; = 0 otherwise. Find the MLE of p, 
the probability of a success. 


The likelihood of the observed sample is the probability of observing yi, yo, ..., Yn- 
Hence, 


n 
L(p) = LO, Y»... Xd p) = PU р)", where y = Уу. 
1=1 


We now wish to find the value of p that maximizes L(p).If y = 0, Lip) = (1— р)", 
апа 2 (р) is maximized when р = 0. Analogously, if y = n, L(p) = p" апа L(p) is 


maximized when p = 1. If y = 1, 2,..., n— 1, then L(p) = р! (1 — p)" is zero 
when p = 0 and p = 1 and is continuous for values of p between 0 and 1. Thus, for 
y=1, 2,..., n— l, we can find the value of p that maximizes L(p) by setting the 


derivative d L(p)/dp equal to 0 and solving for p. 
You will notice that In[L(p)] is a monotonically increasing function of L(p). 
Hence, both In[L(p)] and L(p) are maximized for the same value of p. Because 
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L(p) is a product of functions of p and finding the derivative of products is tedious, 
it is easier to find the value of p that maximizes In[L(p)]. We have 


In[L(p)] = 1n [p ( р)" | = yInp + (п — у) (1 — р). 


Ify= 1, 2,..., n— 1, the derivative of In[ L(p)] with respect to p, is 
d In[LCp)] 1 —1 
= у + (и – у) . 
ар р 1—р 

For y = 1, 2,..., n — 1, the value of p that maximizes (or minimizes) In[L(p)] is 
the solution of the equation 

2 UE 

p 1=р 


Solving, we obtain the estimate = y/n. You can easily verify that this solution 
occurs when In[L(p)] [and hence L(p)] achieves a maximum. 

Because L(p) is maximized at р = 0 when у = 0, at р = 1 when у = папа 
at р = y/n when у = 1, 2,..., п — 1, whatever the observed value of у, L(p) is 
maximized when p = y/n. 

The MLE, р = Y/n, is the fraction of successes in the total number of trials n. 
Hence, the MLE of p is actually the intuitive estimator for p that we used throughout 
Chapter 8. П 


EXAMPLE 9.15 


Solution 


Let Y;, Yo,..., Y, be a random sample from a normal distribution with mean jz and 
variance o°. Find the MLEs of и and o°. 


Because Y1, Y2,..., Y, are continuous random variables, Г. (р, o?) is the joint den- 
sity of the sample. Thus, L(y, с?) = РО, у, ..., Уп 1и, с?2). In this case, 


L(u, о?) = Ff Oty Yz ---, Yn |) 
= filu, о?) x filu. o?) xx РО, о?) 


- | Le [SP |} х...х{ — [=| 
ade "| ^ oum C| 20 


1 n/2 —1 <2 ә 
- (x) exp 22 2,000) . 


[Recall that exp(w) is just another way of writing e" .] Further, 


n n I x 
In[L(u, o°] = zno? 5 m2 522: i i». 
1=1 


The MLEs of и and c? are the values that make In[L (и, a?)] a maximum. Taking 
derivatives with respect to ш and c?, we obtain 


A{In[L(u, o?)]) 1 
ju = 5 20 ш 
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and 


a{In[L(u, oD] icy d DS 
до? Е (5) (2) +55 о и). 


Setting these derivatives equal to zero and solving simultaneously, we obtain from 
the first equation 


1 п . n А . 1 n _ 
LT X or- â)=0, or Y yi -nh = 0, and b=- rey. 
i=l i=l 


і=1 


Substituting y for fi in the second equation and solving for 6 2, we have 


A 1 n 7 3s 1 n a3 
( а OM аб eil мер 


62 


Thus, Y and 6? = + 9 7 (Y; — Y)? are the MLEs of и and o°, respectively. Notice 
that Y is unbiased for ш. Although ó? is not unbiased for o, it can easily be adjusted 
to the unbiased estimator 5? (see Example 8.1). 


EXAMPLE 9.16 


Solution 


Let Yi, Y?,..., Y, be a random sample of observations from a uniform distribution 
with probability density function f (y; |0) = 1/0,for0 < y; < 0 andi = 1,2,...,n. 


Find the MLE of 0. 


In this case, the likelihood is given by 


L(0) = fi Ya... у 10) = fn 10) x 0210) x х FOn 10) 
Pod e ifüzyx0,01—1,2,...,.n, 

—10 90 0 а" == 

0, otherwise. 


Obviously, L(0) is not maximized when L(0) = 0. You will notice that 1/0" is a 
monotonically decreasing function of Ө. Hence, nowhere in the interval 0 < 0 < оо 
is d[1/0"]/d0 equal to zero. However, 1/0" increases as Ө decreases, and 1/0" is 
maximized by selecting 0 to be as small as possible, subject to the constraint that all 
of the y; values are between zero and 0. The smallest value of Ө that satisfies this 


constraint is the maximum observation in the set yj, yo», ..., Ул. That is, д = Yan) = 
max(Y;, Y>,..., Ү,) is the MLE for 0. This MLE for 0 is not an unbiased estimator 
of 0, but it can be adjusted to be unbiased, as shown in Example 9.1. ш 


We have seen that sufficient statistics that best summarize the data have desirable 
properties and often can be used to find an MVUE for parameters of interest. If U 
is any sufficient statistic for the estimation of a parameter 0, including the sufficient 
statistic obtained from the optimal use of the factorization criterion, the MLE is 
always some function of U. That is, the MLE depends on the sample observations 
only through the value of a sufficient statistic. To show this, we need only observe 
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that if U is a sufficient statistic for 0, the factorization criterion (Theorem 9.4) implies 
that the likelihood can be factored as 


L(0) = Lyi, yz, .... Yn 10) = BU, AAC, у, ..., Yn), 


where g(u, Ө) is a function of only u and 0 and h(yi, yo, ..., Yn) does not depend 
on 0. Therefore, it follows that 


In[L(8)] = In[g(u, 0)] + In[h(yi, yo. .... y»). 


Notice that In[A(y1, ух, ..., Yn)] does not depend on Ө and therefore maximizing 
In[L (0)] relative to 0 is equivalent to maximizing In[g (u, 0)] relative to Ө. Because 
In[g(u, 0)] depends on the data only through the value of the sufficient statistic U, the 
МГЕ for Ө is always some function of U. Consequently, if an МГЕ for a parameter 
can be found and then adjusted to be unbiased, the resulting estimator often is an 
MVUE of the parameter in question. 

MLEs have some additional properties that make this method of estimation par- 
ticularly attractive. In Example 9.9, we considered estimation of 6, a function of the 
parameter 0. Functions of other parameters may also be of interest. For example, the 
variance of a binomial random variable is np(1 — p), a function of the parameter p. 
If Y has a Poisson distribution with mean A, it follows that P (Y = 0) = e~*: we may 
wish to estimate this function of A. Generally, if 0 is the parameter associated with 
a distribution, we are sometimes interested in estimating some function of 0—say 
t (0)—T1ather than Ө itself. In Exercise 9.94, you will prove that if t (0) is a one-to-one 
function of Ө and if Ó is the MLE for 0, then the MLE of 1 (0) is given by 


t(8) = 10). 


This result, sometimes referred to as the invariance property of MLEs, also holds for 
any function of a parameter of interest (not just one-to-one functions). See Casella 
and Berger (2002) for details. 


EXAMPLE 9.17 


Solution 


In Example 9.14, we found that the MLE of the binomial proportion p is given by 
Ê = Y/n. What is the MLE for the variance of Y? 


The variance of a binomial random variable Y is given by V(Y) = пр(1 — p). 
Because V(Y) is a function of the binomial parameter p—namely, V(Y) = t(p) 
with t(p) = np(1 — p)—it follows that the MLE of V (Y) is given by 


"m А Y Y 
V(Y) =t(p) =t(p) =n (5) (1 = z) . 
n n 


This estimator is not unbiased. However, using the result in Exercise 9.65, we can 
easily adjust it to make it unbiased. Actually, 


О 6-596) 0-3) 


is the UMVUE for (р) = np(1— p). ш 
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In the next section (optional), we summarize some of the convenient and useful 
large-sample properties of MLEs. 


Exercises 


Suppose that Y;, Yo,..., Y, denote a random sample from the Poisson distribution with 
mean A. 

Find the MLE А for А. 

Find the expected value and variance of A. 

Show that the estimator of part (a) is consistent for A. 

What is the MLE for P(Y = 0) = e? 


оо c» 


Suppose that Yi, Y2,..., Y, denote a random sample from an exponentially distributed popu- 
lation with mean Ө. Find the MLE of the population variance 0?. [Hint: Recall Example 9.9.] 


Let Y;, Y2,..., Y, denote a random sample from the density function given by 
1 aie 
r—1,—y'/0 
— |r е , O>0,y>0, 
7010) = (;) Á | 
0, elsewhere, 


where r is a known positive constant. 


a Find a sufficient statistic for Ө. 
b Find ће MLE of 0. 
с Is the estimator in part (b) an МУСЕ for 0? 


Suppose that Y;, Y2,..., Y, constitute a random sample from a uniform distribution with 
probability density function 
1 
L——, 0<у < 20 +1, 
РО19) 212041 А 
0, otherwise. 


a Obtain the MLE of 0. 
b Obtain the MLE for the variance of the underlying distribution. 
A certain type of electronic component has a lifetime Y (in hours) with probability density 
function given by 
Е ye, у> 0 
fole = } ө? ИЕ 


0, otherwise. 


That is, Y has a gamma distribution with parameters œ = 2 and 0. Let Ó denote the MLE of 
Ө. Suppose that three such components, tested independently, had lifetimes of 120, 130, and 
128 hours. 
а Find the MLE of 6. 
Find E(6) and V (0). 
c Suppose that 0 actually equals 130. Give an approximate bound that you might expect for 
the error of estimation. 


d What is the MLE for the variance of Y? 
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9.85 


9.86 


9.87 


9.88 


9.89 


9.90 


*9.91 


*9.92 


Let Y;, Y2,..., Y, denote a random sample from the density function given by 
1 -1,—y/0 
d Pu евге, 0, 
Оа, 0) = (Fawr)? ° > 

0, elsewhere, 
where o > 0 is known. 
Find the MLE Ó of Ө. 
Find the expected value and variance of 6. 
Show that 6 is consistent for Ө. 
What is the best (minimal) sufficient statistic for 0 in this problem? 


© ато C 9 


Suppose that n = 5 and а = 2. Use the minimal sufficient statistic to construct a 90% 
confidence interval for 0. [Hint: Transform to a x? distribution.] 


Suppose that Х|, Х›,..., Xm, representing yields per acre for corn variety A, constitute a 
random sample from a normal distribution with mean и and variance c?. Also, Y, Yo, ..., Ү,, 
representing yields for corn variety B, constitute a random sample from a normal distribution 
with mean и» and variance o?. If the X's and Y's are independent, find the MLE for the 
common variance o?. Assume that и and и» are unknown. 


A random sample of 100 voters selected from a large population revealed 30 favoring candidate 
A, 38 favoring candidate B, and 32 favoring candidate C. Find MLEs for the proportions of 
voters in the population favoring candidates A, B, and C, respectively. Estimate the difference 
between the fractions favoring A and B and place a 2-standard-deviation bound on the error of 
estimation. 


Let Y;, Y), ..., Y, denote a random sample from the probability density function 
0 
fole) 0+1), 0<y<1, б> -1, 
0, elsewhere. 


Find the MLE for 6. Compare your answer to the method-of-moments estimator found in 
Exercise 9.69. 


It is known that the probability p of tossing heads on an unbalanced coin is either 1/4 or 3/4. 
The coin is tossed twice and a value for Y, the number of heads, is observed. For each possible 
value of У, which of the two values for p (1/4 or 3/4) maximizes the probability that Y = y? 
Depending on the value of y actually observed, what is the MLE of p? 


A random sample of 100 men produced a total of 25 who favored a controversial local 
issue. An independent random sample of 100 women produced a total of 30 who favored 
the issue. Assume that py is the true underlying proportion of men who favor the issue 
and that py is the true underlying proportion of women who favor of the issue. If it actually is 
true that pw = py = p, find the MLE of the common proportion p. 


Find the МГЕ of 0 based on a random sample of size п from a uniform distribution on the 
interval (0, 20). 


Let Y;, Y2,..., Y, bea random sample from a population with density function 
3y? 
7010) = ү e 05750, 
0, elsewhere. 


In Exercise 9.52, you showed that Yn) = max(Y;, Y2,..., Ү„) is sufficient for Ө. 


a Find the MLE for 0. [Hint: See Example 9.16.] 
b Find a function of the MLE in part (a) that is a pivotal quantity. [Hint: see Exercise 9.63.] 
с Use the pivotal quantity from part (b) to find a 100(1 — a)% confidence interval for Ө. 
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Let Y;, Y2, ..., Y, bea random sample from a population with density function 
20? 0 
——, < у < оо, 
010) =} у? : 
0, elsewhere. 


In Exercise 9.53, you showed that У = min(Y;, Y2, ... , Ү,) is sufficient for Ө. 


a Findthe MLE for 0. [Hint: See Example 9.16.] 
b Finda function of the MLE in part (a) that is a pivotal quantity. 
с Use the pivotal quantity from part (b) to find a 100(1 — o)96 confidence interval for 0. 


Suppose that д is the MLE for a parameter 0. Let t (Ө) be a function of Ө that possesses a unique 
inverse [that is, if 8 = t (0), then Ө = t7! (B)]. Show that t (0) is the MLE of t (0). 


A random sample of n items is selected from the large number of items produced by a certain 
production line in one day. Find the MLE of the ratio R, the proportion of defective items 
divided by the proportion of good items. 


Consider a random sample of size п from a normal population with mean и and variance o°, 
both unknown. Derive the MLE of c. 


The geometric probability mass function is given by 


polp-pü-p"., y-L23,.. 
A random sample of size n is taken from a population with a geometric distribution. 


a Find the method-of-moments estimator for p. 
b Find the MLE for p. 


Some Large-Sample Properties of 
Maximum-Likelihood Estimators (Optional) 


Maximum-likelihood estimators also have interesting large-sample properties. Sup- 
pose that 1 (0) is a differentiable function of Ө. In Section 9.7, we argued by the 
invariance property that if Ô is the MLE of Ө, then the MLE of /(0) is given by 
t (0). Under some conditions of regularity that hold for the distributions that we will 
consider, t (0) is a consistent estimator for t (0). In addition, for large sample sizes, 


t(0) — t(0) 


8t(0) P 9?In f (Y | 0) 
nE|—————— 
90 90? 
has approximately a standard normal distribution. In this expression, the quantity 
f (Y | Ө) in the denominator is the density function corresponding to the continuous 
distribution of interest, evaluated at the random value Y. In the discrete case, the 
analogous result holds with the probability function evaluated at the random value Y, 


р(Ү |0) substituted for the density f (Y | 0). If we desire a confidence interval for t (0), 
we can use quantity Z as a pivotal quantity. If we proceed as in Section 8.6, we obtain 


Z= 
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the following approximate large-sample 100(1 — a)% confidence interval for t (0): 


Р 9:00) ? 9210 f(Y | 6 
а | [BOY а-аа 
TA 8t(0) Y? Е a2 In f(Y 10) 
X t(8) + Za/2 | 36 | п | cL | 


We illustrate this with the following example. 


6=6 


EXAMPLE 9.18 For random variable with a Bernoulli distribution, p(y | p) = p (1 — р)! 77, for 
y=0, I. If Y;, Yo,..., Y, denote a random sample of size n from this distribution, 
derive a 100(1 — w)% confidence interval for p(1 — p), the variance associated with 
this distribution. 


Solution As in Example 9.14, the MLE of the parameter p is given by f. = W/n where 
W = У? Y;. It follows that the MLE for t(p) = p(1 — p) is t(p) = P(1 — р). 
In this case, 


a 
(р) = р(1- р) = р-р? and dap. 
Also, 
PQ |p) = pa- p) 
In[p(y | p)] = yn p) + (1 — у) Ind — p) 
ül[p(ylp] v» 1-7 
др p ip 
д2 mn[pylpl | y» | 1-y» 
др? | P (1-р) 
9?In[p(Y| p]]| _ Y l=¥ 
z|- ap? jaa ш 
р 1= р 1 1 1 


pop р 1=р pa-p) 
Substituting into the earlier formula for the confidence interval for t (0), we obtain 


-— at(p) | _# InpQY | p) 
10р) + 2/2 |l Эр || ap? || 2 
—pu--p)czao [aca /n| — || 
p(l- р) 


ioe, ETUESOTUSETE 


n 


р=р 


as the desired confidence interval for p(1 — р). L1 
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Exercises 


Refer to Exercise 9.97. What is the approximate variance of the MLE? 


Consider the distribution discussed in Example 9.18. Use the method presented in Section 9.8 
to derive a 100(1 — o)?6 confidence interval for (р) = p. Is the resulting interval familiar to 
you? 


Suppose that Y;, Y2,..., Y, constitute a random sample of size п from an exponential distri- 
bution with mean Ө. Find a 100(1 — œ)% confidence interval for t (0) = 07. 


Let Yi, Y», ..., Y, denote a random sample of size п from a Poisson distribution with mean 
A. Find a 100(1 — œ)% confidence interval for т (à) = е^ = P(Y = 0). 


Refer to Exercises 9.97 and 9.98. If a sample of size 30 yields y — 4.4, find a 9596 confidence 
interval for p. 


Summary 


In this chapter, we continued and extended the discussion of estimation begun in 
Chapter 8. Good estimators are consistent and efficient when compared to other 
estimators. The most efficient estimators, those with the smallest variances, are func- 
tions of the sufficient statistics that best summarize all of the information about the 
parameter of interest. 

Two methods of finding estimators—the method of moments and the method of 
maximum likelihood— were presented. Moment estimators are consistent but gener- 
ally not very efficient. MLEs, on the other hand, are consistent and, if adjusted to be 
unbiased, often lead to minimum-variance unbiased estimators. Because they have 
many good properties, MLEs are often used in practice. 
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9.104 
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Supplementary Exercises 


A random sample of size п is taken from a population with a Rayleigh distribution. As in 
Exercise 9.34, the Rayleigh density function is 


2y 2 
—y* [8 
ЈО) = (2). dia 


0, elsewhere. 


a Findthe MLE of 0. 


*b Find the approximate variance of the MLE obtained in part (a). 


Suppose that Yi, Yo,..., Y, constitute a random sample from the density function 
e 079, у> Ө, 
fl) = | f 
0, elsewhere 


where Ө is an unknown, positive constant. 


a Find an estimator 6, for Ө by the method of moments. 
Find an estimator ĝ, for Ө by the method of maximum likelihood. 
Adjust 6, and Ê, so that they are unbiased. Find the efficiency of the adjusted 6, relative to 
the adjusted 0. 

Refer to Exercise 9.38(b). Under the conditions outlined there, find the MLE of o°. 


Suppose that Y;, Y2,..., Y, denote a random sample from a Poisson distribution with mean 
A. Find the MVUE of P (Y; = 0) = e^. [Hint: Make use of the Rao-Blackwell theorem.] 


Suppose that a random sample of length-of-life measurements, Yi, Y2,..., Yn, isto be taken 
of components whose length of life has an exponential distribution with mean 0. It is frequently 
of interest to estimate 


F(t) 21— F(t) =e, 
the reliability at time t of such a component. For any fixed value of г, find the MLE of F(t). 


The MLE obtained in Exercise 9.107 is a function of the minimal sufficient statistic for 0, but 
it is not unbiased. Use the Rao-Blackwell theorem to find ће MVUE of e^? by the following 
steps. 


a Let 


| 1, Ү, Fy 
V = 
0, elsewhere. 


Show that V is an unbiased estimator of e~'/?. 
b Because U = Y Y; is the minimal sufficient statistic for 0, show that the conditional 
i=l 
density function for Y;, given U = и, is 


n—l n—2 
frw lu) = unl C а VE єй, 
0, elsewhere. 


c Show that 


п—1 
Ev = РФ: > n - (1-5) | 
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This is ће MVUE of e^"? by the Rao-Blackwell theorem and by the fact that the density 
function for U is complete. 


Suppose that n integers are drawn at random and with replacement from the integers 1, 2,..., М. 
That is, each sampled integer has probability 1/N of taking on any of the values 1,2, ..., N, 
and the sampled values are independent. 


a Find the method-of-moments estimator A 1 of N. 
b Find E(N,) and V (Ñ). 


Refer to Exercise 9.109. 


a Find the MLE № of N. 


b Show that E (А) is approximately [л /(п + 1)]N. Adjust №, to form an estimator Ñ; that 
is approximately unbiased for N. 


c Find an approximate variance for A; by using the fact that for large N the variance of the 
largest sampled integer is approximately 
nN? 
(1*0 2) 
d Show that for large № and n > 1, V(N3) < V (Ñi). 


Refer to Exercise 9.110. Suppose that enemy tanks have serial numbers 1, 2,..., N. A spy 
randomly observed five tanks (with replacement) with serial numbers 97, 64, 118, 210, and 
57. Estimate N and place a bound on the error of estimation. 


Let Yi, Y2,..., Y, denote a random sample from a Poisson distribution with mean А and define 


a Show that the distribution of W, converges to a standard normal distribution. 


b Use W, and the result in part (a) to derive the formula for an approximate 95% confidence 
interval for A. 
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Introduction 


Recall that the objective of statistics often is to make inferences about unknown pop- 
ulation parameters based on information contained in sample data. These inferences 
are phrased in one of two ways: as estimates of the respective parameters or as tests of 
hypotheses about their values. Chapters 8 and 9 dealt with estimation. In this chapter, 
we discuss the general topic of hypothesis testing. 

In many ways, the formal procedure for hypothesis testing is similar to the scientific 
method. The scientist observes nature, formulates a theory, and then tests this theory 
against observation. In our context, the scientist poses a hypothesis concerning one or 
more population parameters—that they equal specified values. She then samples the 
population and compares her observations with the hypothesis. If the observations 
disagree with the hypothesis, the scientist rejects it. If not, the scientist concludes 
either that the hypothesis is true or that the sample did not detect the difference 
between the real and hypothesized values of the population parameters. 
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For example, a medical researcher may hypothesize that a new drug is more effec- 
tive than another in combating a disease. To test her hypothesis, she randomly selects 
patients infected with the disease and randomly divides them into two groups. The 
new drug A is given to the patients in the first group, and the old drug B is given to 
the patients in the second group. Then, based on the number of patients in each group 
who recover from the disease, the researcher must decide whether the new drug is 
more effective than the old. 

Hypothesis tests are conducted in all fields in which theory can be tested against 
observation. A quality control engineer may hypothesize that a new assembly method 
produces only 5% defective items. An educator may claim that two methods of teach- 
ing reading are equally effective, or a political candidate may claim that a plurality 
of voters favor his election. All such hypotheses can be subjected to statistical verifi- 
cation by using observed sample data. 

What is the role of statistics in testing hypotheses? Putting it more bluntly, of what 
value is statistics in this hypothesis testing procedure? Testing a hypothesis requires 
making a decision when comparing the observed sample with theory. How do we 
decide whether the sample disagrees with the scientist’s hypothesis? When should 
we reject the hypothesis, when should we accept it, and when should we withhold 
judgment? What is the probability that we will make the wrong decision and conse- 
quently be led to a loss? And, particularly, what function of the sample measurements 
should be employed to reach a decision? The answers to these questions are contained 
in a study of statistical hypothesis testing. 

Chapter 8 introduced the general topic of estimation and presented some intuitive 
estimation procedures. Chapter 9 presented some properties of estimators and some 
formal methods for deriving estimators. We use the same approach in our discussion 
of hypothesis testing. That is, we introduce the topic, present some intuitive testing 
procedures, and then consider some formal methods for deriving statistical hypothesis 
testing procedures. 


Elements of a Statistical Test 


Many times, the objective of a statistical test is to test a hypothesis concerning the 
values of one or more population parameters. We generally have a theory—a research 
hypothesis—about the parameter(s) that we wish to support. For example, suppose 
that a political candidate, Jones, claims that he will gain more than 5046 of the votes 
in a city election and thereby emerge as the winner. If we do not believe Jones's 
claim, we might seek to support the research hypothesis that Jones is not favored by 
more than 5046 of the electorate. Support for this research hypothesis, also called the 
alternative hypothesis, is obtained by showing (using the sample data as evidence) that 
the converse of the alternative hypothesis, called the null hypothesis, is false. Thus, 
support for one theory is obtained by showing lack of support for its converse— 
in a sense, a proof by contradiction. Because we seek support for the alternative 
hypothesis that Jones's claim is false, our alternative hypothesis is that p, the prob- 
ability of selecting a voter favoring Jones, is less than .5. If we can show that the 
data support rejection of the null hypothesis p — .5 (the minimum value needed for a 
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plurality) in favor of the alternative hypothesis р < .5, we have achieved our research 
objective. Although it is common to speak of testing a null hypothesis, the research 
objective usually is to show support for the alternative hypothesis, if such support is 
warranted. 

How do we use that data to decide between the null hypothesis and the alternative 
hypothesis? Suppose that n — 15 voters are randomly selected from the city and Y, 
the number favoring Jones, is recorded. If none in the sample favor Jones (Y — 0), 
what would you conclude about Jones's claim? If Jones is actually favored by more 
than 50% of the electorate, it is not impossible to observe Y = 0 favoring Jones in 
a sample of size n — 15, but it is highly improbable. It is much more likely that we 
would observe Y = 0 if the alternative hypothesis were true. Thus, we would reject 
the null hypothesis (p = .5) in favor of the alternative hypothesis (p < .5). If we 
observed Y — 1 (or any small value of Y), analogous reasoning would lead us to the 
same conclusion. 

Any statistical test of hypotheses works in exactly the same way and is composed 
of the same essential elements. 


The Elements of a Statistical Test 


1. Null hypothesis, Ho 

2. Alternative hypothesis, Ha 
3. Test statistic 

4. Rejection region 


For our example, the hypothesis to be tested, called the null hypothesis and denoted 
by Ho, is р = .5. The alternative (or research) hypothesis, denoted as H4, is the 
hypothesis to be accepted in case Ho is rejected. The alternative hypothesis usually 
is the hypothesis that we seek to support on the basis of the information contained in 
the sample; thus, in our example, H4 is p < .5. 

The functioning parts of a statistical test are the test statistic and an associated 
rejection region. The fest statistic (like an estimator) is a function of the sample 
measurements (Y in our example) on which the statistical decision will be based. 
The rejection region, which will henceforth be denoted by RR, specifies the values of 
the test statistic for which the null hypothesis is to be rejected in favor of the alternative 
hypothesis. If for a particular sample, the computed value of the test statistic falls in 
the rejection region RR, we reject the null hypothesis Ho and accept the alternative 
hypothesis H4. If the value of the test statistic does not fall into the RR, we accept Ho. 
As previously indicated, for our example small values of Y would lead us to reject 
Ho. Therefore, one rejection region that we might want to consider is the set of all 
values of Y less than or equal to 2. We will use the notation RR = {y : y < 2}—or, 
more simply, RR = (y < 2]—to denote this rejection region. 

Finding a good rejection region for a statistical test is an interesting problem 
that merits further attention. It is clear that small values of Y—say, y < К (see 
Figure 10.1)—are contradictory to the hypothesis Ho: p — .5 but favorable to the 
alternative H,:p < .5. So we intuitively choose the rejection region as RR = 
{у < К}. But what value should we choose for К? More generally, we seek some 


FIGURE 10.1 
Rejection region, 

RR = (yx lj, fora 
test of the hypothesis 
Ho: p= .5 against 
the alternative 

Нр 5 
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Rejection 
Region, RR 


рр 
0 1 2 [4 (k + 1) 11 12 13 14 15 


y: Number of Voters Favoring Jones 


objective criteria for deciding which value of k specifies a good rejection region of the 
form (y < k}. 

For any fixed rejection region (determined by a particular value of k), two types 
of errors can be made in reaching a decision. We can decide in favor of H, when Ho 
is true (make a type I error), or we can decide in favor of Ho when Н, is true (make 
a type П error). 


A type I error is made if Но is rejected when Hy is true. The probability of a 
type I error is denoted by a. The value of o is called the level of the test. 

A type II error is made if Ho is accepted when H, is true. The probability 
of a type П error is denoted by В. 


For Jones's political poll, making a type I error—rejecting Ho: p = .5 (and thereby 
accepting Ha : p < .5) when in fact Ho is true—means concluding that Jones will lose 
when, in fact, he is going to win. In contrast, making a type II error means accepting 
Ho: p = .5 when p < .5 and concluding that Jones will win when, in fact he will 
lose. For most real situations, incorrect decisions cost money, prestige, or time and 
imply a loss. Thus, о and f, the probabilities of making these two types of errors, 
measure the risks associated with the two possible erroneous decisions that might 
result from a statistical test. As such, they provide a very practical way to measure 
the goodness of a test. 


EXAMPLE 10.1 


Solution 


For Jones's political poll, п = 15 voters were sampled. We wish to test Ho: p = .5 
against the alternative, H,:p < .5. The test statistic is Y, the number of sampled 
voters favoring Jones. Calculate o if we select RR = (y < 2] as the rejection region. 


By definition, 


a = P(typelerror) = P(rejecting Ho when Но is true) 
= P(value of test statistic is in RR when Но is true) 


= P(Y < 2 when p = .5). 


Observe that Y is a binomial random variable with n — 15. If Ho is true, p — .5 and 
we obtain 


2 
а = RCo = (8) со" + (сз + (2)c»*. 
$9 NJ 0 1 2 
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Using Table 1, Appendix 3, to circumvent this computation, we бра о = .004. Thus, 
if we decide to use the rejection region RR = {у < 2], we subject ourselves to a very 
small risk (о = .004) of concluding that Jones will lose if in fact he is a winner. Ё 


EXAMPLE 10.2 


Solution 


Refer to Example 10.1. Is our test equally good in protecting us from concluding 
that Jones is a winner if in fact he will lose? Suppose that he will receive 30% of the 
votes (p = .3). What is the probability 6 that the sample will erroneously lead us to 
conclude that Не is true and that Jones is going to win? 


By definition, 
В = P(type П error) = P (accepting Ho when Н, is true) 
= P (value of the test statistic is not in RR when H, is true). 
Because we want to calculate 6 when р = .3 (a particular value of p that is in Ha), 


15 
В = P(Y »2whenp-.3)- У) (Soren. 
y 


y=3 
Again consulting Table 1, Appendix 3, we find that 6 = .873. If weuse КК = {y < 2}, 
our test will usually lead us to conclude that Jones is a winner (with probability 
В = .873), even if р is as low as р = .3. 


The value of 6 depends on the true value of the parameter p. The larger the 
difference is between p and the (null) hypothesized value of p = .5, the smaller is 
the likelihood that we will fail to reject the null hypothesis. 


EXAMPLE 10.3 


Solution 


Refer to Examples 10.1 and 10.2. Calculate the value of В if Jones will receive only 
10% of the votes (p = .1). 


In this case, we want to calculate В when р = .1 (another particular value of p in Ha). 
В = P(type П error) = P(accepting Ho when р = .1) 


= P (value of test statistic is not in RR when р = .1) 


15 
= P(Y > 2мћепр = 1) = У’ NI = .184. 

у=3 
Consequently, if we use {у < 2} as the rejection region, the value of 6 when р = .10 
is smaller than the value for В that we obtained in Example 10.2 with р = .30 
(.184 versus .873). Nonetheless, when using this rejection region, we still have a 
fairly large probability of claiming that Jones is a winner if in fact he will receive 
only 1096 of the votes. E 
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Examples 10.1 through 10.3 show that the test using RR = {y < 2} guarantees a 
low risk of making a type I error (о = .004), but it does not offer adequate protection 
against a type П error. How can we improve our test? One way is to balance o and В 
by changing the rejection region. If we enlarge RR into a new rejection region RR* 
(that is, RR C RR*), the test using КЕ“ will lead us to reject Но more often. If &* and 
a denote the probabilities of type I errors (levels of the tests) when we use RR* and 
RR as the rejection regions, respectively, then, because RR C КК“, 


о* = P (test statistic is in RR* when Hp is true) 
> P(test statistic is in RR when Но is true) = o. 


Likewise, if we use the enlarged rejection region RR", the test procedure will lead 
us to accept Ho less often. If 8* and 6 denote the probabilities of type II errors for 
the tests using RR* and RR, respectively, then 


B* = Р (test statistic is not in RR* when H, is true) 
< P(test statistic is not in RR when H, is true) = В. 


Hence, if we change the rejection region to increase œ, then В will decrease. Similarly, 
if the change in rejection region results in a decrease in a, then В will increase. Thus, 
a and В are inversely related. 


EXAMPLE 10.4 


Solution 


Refer to the test discussed in Example 10.1. Now assume that RR = (y < 5}. Calculate 
the level o of the test and calculate В if p = .3. Compare the results with the values 
obtained in Examples 10.1 and 10.2 (where we used RR = (y < 2}). 


In this case, 


a = P (test statistic is in RR when Hp is true) 


5 
= P(Y < 5 when p = .5)= È` (ear = .151. 


y=0 
When p = .3, 


В = P (test statistic is not in RR when H, is true and p = .3) 
15 


= P(Y > 5мћепр = Дуу, Eao = 278. 


y=6 \~ 


A comparison of the o and £ calculated here with the results of Examples 10.1 and 
10.2 shows that enlarging the rejection region from RR = (y < 2) to RR* = (y < 5} 
increased o and decreased В (see Table 10.1). Hence, we have achieved a better 


Table 10.1 Comparison of œ and {3 for two different rejection regions 
RR 
Probabilities of Error {у < 2} {у < 5} 


a ‚004 151 
B when p = .3 .873 218 
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balance between the risks of type I and type П errors, but both o and В remain 
disconcertingly large. How can we reduce both a and B? The answer is intuitively 
clear: Shed more light on the true nature of the population by increasing the sample 
size. For almost all statistical tests, if o is fixed at some acceptably small value, 6 
decreases as the sample size increases. ш 


10.1 
10.2 


10.3 


10.4 


In this section, we have defined the essential elements of any statistical test. We 
have seen that two possible types of error can be made when testing hypotheses: type I 
and type II errors. The probabilities of these errors serve as criteria for evaluating a 
testing procedure. In the next few sections, we will use the sampling distributions 
derived in Chapter 7 to develop methods for testing hypotheses about parameters of 
frequent practical interest. 


Exercises 


Define о and f for a statistical test of hypotheses. 


An experimenter has prepared a drug dosage level that she claims will induce sleep for 8096 of 
people suffering from insomnia. After examining the dosage, we feel that her claims regarding 
the effectiveness of the dosage are inflated. In an attempt to disprove her claim, we administer 
her prescribed dosage to 20 insomniacs and we observe Y, the number for whom the drug dose 
induces sleep. We wish to test the hypothesis Ho: p = .8 versus the alternative, H, : p < .8. 
Assume that the rejection region (y < 12} is used. 


In terms of this problem, what is a type I error? 
Find о. 

In terms of this problem, what is a type II error? 
Find 6 when p = .6. 

Find 6 when p = .4. 


ооо C » 


Refer to Exercise 10.2. 


a Find the rejection region of the form {у < c} so that o ~% .01. 
b For the rejection region in part (a), find @ when p = .6. 
с For the rejection region in part (a), find В when p = .4. 


Suppose that we wish to test the null hypothesis Hp that the proportion p of ledger sheets with 
errors is equal to .05 versus the alternative H,, that the proportion is larger than .05, by using 
the following scheme. Two ledger sheets are selected at random. If both are error free, we reject 
Ho. If one or more contains an error, we look at a third sheet. If the third sheet is error free, we 
reject Ho. In all other cases, we accept Ho. 


In terms of this problem, what is a type I error? 
What is the value of о associated with this test? 


In terms of this problem, what is a type II error? 


с с C $9 


Calculate 8 = P(type II error) as a function of p. 
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Let Y; and Y, be independent and identically distributed with a uniform distribution over the 
interval (0, 0 + 1). For testing Ho :0 = 0 versus H, :0 > 0, we have two competing tests: 


Test 1: Reject Ho if Y; > .95. 
Test 2: Reject Ho if Y; + Yo > c. 


Find the value of c so that test 2 has the same value for o as test 1. [Hint: In Example 6.3, we 
derived the density and distribution function of the sum of two independent random variables 
that are uniformly distributed on the interval (0, 1).] 


We are interested in testing whether or not a coin is balanced based on the number of heads 
Y on 36 tosses of the coin. (Ho: p = .5 versus H,: p 5 .5). If we use the rejection region 
ly = 18| > 4, what is 

a the value of o? 

b the value of B if p = .7? 


True or False Refer to Exercise 10.6. 


The level of the test computed in Exercise 10.6(a) is the probability that Hp is true. 
The value of В computed in Exercise 10.6(b) is the probability that H, is true. 


In Exercise 10.6(b), 6 was computed assuming that the null hypothesis was false. 


ceo C$ 


If B was computed when р = 0.55, the value would be larger than the value of В obtained 
in Exercise 10.6(b). 


The probability that the test mistakenly rejects Ho is В. 


= © 


Suppose that RR was changed to |y — 18| > 2. 


i This RR would lead to rejecting the null hypothesis more often than the RR used in 
Exercise 10.6. 


ii Ifo was computed using this new RR, the value would be larger than the value obtained 
in Exercise 10.6(a). 

iii If was computed when р = .7 and using this new RR, the value would be larger than 
the value obtained in Exercise 10.6(b). 


A two-stage clinical trial is planned for testing Ho: p = .10 versus H,: p > .10, where p 
is the proportion of responders among patients who were treated by the protocol treatment. At 
the first stage, 15 patients are accrued and treated. If 4 or more responders are observed among 
the (first) 15 patients, Но is rejected, the study is terminated, and no more patients are accrued. 
Otherwise, another 15 patients will be accrued and treated in the second stage. If a total of 6 
or more responders are observed among the 30 patients accrued in the two stages (15 in the 
first stage and 15 more in the second stage), then Hp is rejected. For example, if 5 responders 
are found among the first-stage patients, Ho is rejected and the study is over. However, if 2 
responders are found among the first-stage patients, 15 second-stage patients are accrued, and 
an additional 4 or more responders (for a total of 6 or more among the 30) are identified, Ho is 
rejected and the study is over.! 


a Use the binomial table to find the numerical value of o for this testing procedure. 
b Use the binomial table to find the probability of rejecting the null hypothesis when using 
this rejection region if p — .30. 


с For the rejection region defined above, find В if p = .30. 


1. Exercises preceded by an asterisk are optional. 
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FIGURE 10.2 
Sampling 
distributions of 
the estimator Ó for 
various values of Ө 


FIGURE 10.3 
Large-sample 
rejection region for 
Ho :0 = Ө, versus 
H,:0 > 0 


Common Large-Sample Tests 


Suppose that we want to test a set of hypotheses concerning a parameter 0 based on 
a random sample Y;, Үә, ..., Ү„. In this section, we will develop hypothesis-testing 
procedures that are based on an estimator Ê that has an (approximately) normal 
sampling distribution with mean 0 and standard error og. The large-sample estimators 
of Chapter 8 (Table 8.1), such as Y and f, satisfy these requirements. So do the 
estimators used to compare of two population means (u1 — u2) and for the comparison 
of two binomial parameters (p; — p2). 

If 0o is a specific value of 0, we may wish to test Ho :0 = 0 versus H,:0 > 69. 
Figure 10.2 contains a graph illustrating the sampling distributions of Ó for various 
values of 0. If Ê is close to 00, it seems reasonable to accept Ho. If in reality 0 > 
00, however, 0 is more likely to be large. Consequently, large values of д (values 
larger than 00 by a suitable amount) favor rejection of H9:0 = 00 and acceptance 
of H,:0 > 09. That is, the null and alternative hypotheses, the test statistic, and the 
rejection region are as follows: 


Ho 26 = A. 

H,:0 > %. 

Test statistic: 0. 

Rejection region: RR = {6 > k} for some choice of k. 


The actual value of k in the rejection region RR is determined by fixing the type I 
error probability o (the level of the test) and choosing k accordingly (see Figure 10.3). 
If Ap is true, д has an approximately normal distribution with mean б and standard 


fê) 


76) 


[^ k 6 


Reject Ho 
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error og. Therefore, if we desire an a-level test, 
К = 0o + 2400 


is the appropriate choice for k [if Z has a standard normal distribution, then Zg is such 
that P(Z > za) = a]. Because 


"-- „ 0—6 
КК = (0:0 > 0o + 2003) = 40: > Ду 
00 
if Z = (6 — 00)/ сд is used as the test statistic, the rejection region may also be written 
as RR = {z > zy}. Notice that Z measures the number of standard errors between 
the estimator for Ө and 00, the value of 0 specified in Hp. Thus, an equivalent form of 
the test of hypothesis, with level a, is as follows: 


Ho :0 = 0p. 
Н,:0 > 69. . 
0 — 0 
Test statistic: Z — 15 
aĝ 
Rejection region: {z > Za}. 


Но is rejected if Z falls far enough into the upper tail of the standard normal dis- 
tribution. The alternative hypothesis H,:0 > Qo is called an upper-tail alternative, 
and RR = {z > д} is referred to as an upper-tail rejection region. Notice that the 
preceding formula for Z is simply 


estimator for the parameter — value for the parameter given by Ho 


standard error of the estimator 


EXAMPLE 10.5 


Solution 


A vice president in charge of sales for a large corporation claims that salespeople are 
averaging no more than 15 sales contacts per week. (He would like to increase this 
figure.) As a check on his claim, n = 36 salespeople are selected at random, and the 
number of contacts made by each is recorded for a single randomly selected week. 
The mean and variance of the 36 measurements were 17 and 9, respectively. Does the 
evidence contradict the vice president's claim? Use a test with level о = .05. 


We are interested in the research hypothesis that the vice president’s claim is incorrect. 
This can be formally written as H, : u > 15, where u is the mean number of sales 
contacts per week. Thus, we are interested in testing 


Но: = 15 against Н,: џи > 15. 


We know that for large enough n, the sample mean Y is a point estimator of ш that is 
approximately normally distributed with wy = и and oy = o/./n. Hence, our test 
Statistic is 


r- Mo — Y- Ho 

oy a//n ` 
The rejection region, with о = .05, is given by {z > 205 = 1.645} (see Table 4, 
Appendix 3). The population variance o? is not known, but it can be estimated very 
accurately (because n = 36 is sufficiently large) by the sample variance s? = 9. 


Z= 
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Thus, the observed value of the test statistic is approximately 
у = ш _ 17— 15 zd 
s/n — 3/436. ` 

Because the observed value of Z lies in the rejection region (because z — 4 exceeds 
zos = 1.645), we reject Но: и = 15. Thus, at the a = .05 level of significance, the 


evidence is sufficient to indicate that the vice president's claim is incorrect and that 
the average number of sales contacts per week exceeds 15. EH 


= 


EXAMPLE 10.6 


Solution 


A machine in a factory must be repaired if it produces more than 10% defectives 
among the large lot of items that it produces in a day. A random sample of 100 items 
from the day’s production contains 15 defectives, and the supervisor says that the 
machine must be repaired. Does the sample evidence support his decision? Use a test 
with level .01. 


If Y denotes the number of observed defectives, then Y is a binomial random variable, 
with p denoting the probability that a randomly selected item is defective. Hence, we 
want to test the null hypothesis 


Ho:p = .10 against the alternative H,:p  .10. 


The test statistic, which is based on p = Y/n (the unbiased point estimator of р), is 
given by 


Вр. ЁЙ—ро 


op A/po(l — po)/n. 


We could have used ./p(1 — f)/n to approximate the standard error of f, but because 
we are considering the distribution of Z under Ho, it is more appropriate to use 
A/ po(1 — po)/n, the true value of the standard error of р when Ho is true. 

From Table 4, Appendix 3, we see that P(Z > 2.33) = .01. Hence, we take 
{z > 2.33} as the rejection region. The observed value of the test statistic is given by 


5 — JS ul 
Peur o TUM ? Hex 


tU VRü- pn XCDCyI0 3 

Because the observed value of Z is not in the rejection region, we cannot reject 
Ho: p = .10 in favor of H, : p > .10. In terms of this application, we conclude that, 
at the о = .01 level of significance, the evidence does not support the supervisor's 
decision. 

Is the supervisor wrong? We can not make a statistical judgment about this until 
we have evaluated the probability of accepting Ho when H, is true—that is, until we 
have calculated В. The method for calculating В is presented in Section 10.4. E 


Z 


FIGURE 10.4 
Rejection regions for 
testing Ho :0 = 60 
versus (а) Н, :0 < 05 
and (b) Н, :0 Æ 9%, 
ô — 0o 

E 
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Testing Но:0 = 0 against H,:0 < Өр is done in an analogous manner, except 
that we now reject Но for values of 0 that are much smaller than 09. The test statistic 
remains 


à — 6 
Z= E 
0j 


but for a fixed level о we reject the null hypothesis when z < —z,. Because we 
reject Ho in favor of H, when z falls far enough into the lower tail of the standard 
normal distribution, we call Н, :0 < 09 a lower-tail alternative and RR: {z < —} 
a lower-tail rejection region. 

In testing Но:0 = 6o against H,:0 = 09, we reject Ho if Ê is either much smal- 
ler or much larger than бу. The test statistic is still Z, as before, but the rejection 
region is located symmetrically in the two tails of the probability distribution for Z. 
Thus, we reject Ho if either z < —zo/? or z > оу. Equivalently, we reject Ho if 
|z| > гар. This test is called a two-tailed test, as opposed to the one-tailed tests 
used for the alternatives Ө < бу and Ө > 09. The rejection regions for the lower-tail 
alternative, Ha :0 < 06, and the two-sided alternative, Ha :0 5 00, are displayed in 
Figure 10.4. 

A summary of the large-sample o-level hypothesis tests developed so far is given 
next. 


0 


Reject Ну —z 


“a 


(a) 


Reject Hy — а/2 Zap Reject Ho 


(b) 
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Large-Sample a-Level Hypothesis Tests 
йуз (0 = ig 
Ө > 609 (upper-tail alternative). 
Ha: $ Ө < o (lower-tail alternative). 
Ө Æ 0o (two-tailed alternative). 


106 0-0 
Test statistic: Z = 4 


I = жу) (upper-tail RR). 
Rejection region: { (z < —zo) (lower-tail RR). 
(Iz| > Za2} (two-tailed RR). 


In any particular test, only one of the listed alternatives H, is appropriate. Whatever 
alternative hypothesis that we choose, we must be sure to use the corresponding 
rejection region. 

How do we decide which alternative to use for a test? The answer depends on the 
hypothesis that we seek to support. If we are interested only in detecting an increase 
in the percentage of defectives (Example 10.6), we should locate the rejection region 
in the upper tail of the standard normal distribution. On the other hand, if we wish to 
detect a change in p either above or below p = .10, we should locate the rejection 
region in both tails of the standard normal distribution and employ a two-tailed test. 
The following example illustrates a situation in which a two-tailed test is appropriate. 


A psychological study was conducted to compare the reaction times of men and 
women to a stimulus. Independent random samples of 50 men and 50 women were 
employed in the experiment. The results are shown in Table 10.2. Do the data present 
sufficient evidence to suggest a difference between true mean reaction times for men 
and women? Use a = .05. 


Let и and u3 denote the true mean reaction times for men and women, respectively. 
If we wish to test the hypothesis that the means differ, we must test Ho : (ш — u2) = 0 
against Н: (ш — u2) Æ 0. The two-sided alternative permits us to detect either the 
case ші > uz or the reverse case u2 > p1; in either case, Ho is false. 

The point estimator of (ш — 142) is (Y, — Y2). As we discussed in Sections 8.3 and 
8.6, because the samples are independent and both are large, this estimator satisfies 
the assumptions necessary to develop a large-sample test. Hence, if we desire to test 


Table 10.2 Data for Example 10.7 


Men Women 
n; — 50 n, = 50 
y, = 3.6 seconds y» = 3.8 seconds 


52 = .18 52 = .14 
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Ho: ш — 42 = Do (where Dy is some fixed value) versus any alternative, the test 
statistic is given by 


_ (Yı — Y2) — Do 

= , 
2 2 
91 92 
nı n» 


where o? and 0 are the respective population variances. In this application, we want 
to use a two-tailed test. Thus, fora = .05, we reject Ho for |z| > zo? = 2.025 = 1.96. 

For large samples (say, п; > 30), the sample variances provide good estimates of 
their corresponding population variances. Substituting these values, along with y,, у», 
пі, n2, and Do = 0, into the formula for the test statistic, we have 


_F-w-0 36-33 


This value is less than —Zy/2 = —1.96 and therefore falls in the rejection region. 
Hence, at the о = .05 level, we conclude that sufficient evidence exists to permit us 
to conclude that mean reaction times differ for men and women. ш 


10.9 


In this section, we have described the general procedure for implementing a large- 
sample tests of hypotheses for some parameters of frequent practical interest. We 
will discuss in Section 10.4 how to calculate В, the probability of a type П error, for 
these large-sample tests. Constructing confidence intervals for these parameters and 
implementing formal tests of hypotheses are remarkably similar. Both procedures use 
the estimators of the respective parameters, the standard errors of these estimators, and 
quantities obtained from the table of the standard normal distribution. In Section 10.5, 
we will explicitly point out a correspondence between large-sample testing procedures 
and large-sample confidence intervals. 


Exercises 


Applet Exercise Use the applet Hypothesis Testing (for Proportions) to assess the impact of 
changing the sample size on the value of o. When you access the applet, the default settings 
will permit simulations, when the true value of p = .5, of repeated œ = .05 level Z-tests for 
Ho : p = .5 versus Н, : p # .5 and n = 15. 


a What action qualifies as an "error" in the scenario to be simulated? 


b Click the button “Draw Sample" to obtain the results associated with a single sample of 
size 15. How many successes resulted? What is the value for р? Compute the value of the 
large-sample test statistic. Does your calculation agree with the value of z given in the table 
beneath the normal curve? Does the value of z fall in the rejection region? Did the result 
of this simulation result in an error? 

c Click the button “Draw Sample" five more times. How many different values for z did 
you observe? How many values appeared in the rejection region given by the tails of the 
normal curve? 
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Click the button ^Draw Sample" until you obtain a simulated sample that results in rejecting 
Ho. What was the value of р that led to rejection of Ho? How many tests did you perform 
until you first rejected Hy? Why did it take so many simulations until you first rejected 
the null? 


Click the button *Draw 50 Samples" until you have completed 200 or more simulations. 
Hover the pointer over the shaded box above “Reject” in the bottom bar graph. What 
proportion of the simulations resulted in rejecting Hy? 


Why are the boxes above “Reject” and “Error” of exactly the same height? 


Use the up and down arrows to the right of the “n for sample” line to change the sample size 
for each simulation to 20. Click the button “Draw 50 Samples" until you have simulated at 
least 200 tests. What proportion of the simulations resulted in rejecting Ho? 

Repeat the instructions in part (g) for samples of size 30, 40, and 50. Click the button "Show 
Summary" to see the results of all simulations that you performed thus far. What do you 
observe about the proportions of times that Hp is rejected using samples of size 15, 20, 30, 
40, and 50? Are you surprised by these results? Why? 


Applet Exercise Referto Exercise 10.9. Click the button “Clear Summary" to delete the results 
of any previous simulations. Change the sample size for each simulation to п = 30 and leave 
the null and alternative hypotheses at their default settings Ho: p = .5, Ha : p # .5. 


a 


Leave the true value of p at its default setting p — .5. With this scenario, what is an error? 
Simulate at least 200 tests. What proportion of the tests resulted in rejecting H9? What 
do you notice about the heights of the boxes above “Reject” and “Error” in the bottom 
right-hand graph? Why? 

Leave all settings unchanged except change the true value of p to .6. With this modification, 
what is an error? Simulate at least 200 tests. What proportion of the tests resulted in rejecting 
Ho? What do you notice about the heights of the boxes above “Reject” and "Error" in the 
bottom right-hand graph? Why? 

Leave all settings from part (b) unchanged except change the true value of p to .7. Simulate 
at least 200 tests. Repeat, setting the true value of p to .8. Click the button “Show Summary.” 
As the true value of p moves further from .5 and closer to 1, what do you observe about the 
proportion of simulations that lead to rejection of Но? What would you expect to observe 
if a set of simulations was conducted when the true value of p is .9? 

What would you expect to observe if simulations were repeated when the real value of p 
is .4, .3, and .2? Try it. 


Applet Exercise In Exercise 10.9(h), you observed that when the null hypothesis is true, for 
all sample sizes the proportion of the time Но is rejected is approximately equal to œ the 
probability of a type I error. If we test Ho: p = .5, Ha : p 5% .5, what happens to the value of 
В when the sample size increases? Set the real value of p to .6 and keep the rest of the settings 
at their default values (a = .05, n = 15). 


a 
b 


d 


In the scenario to be simulated, what is the only kind of error that can be made? 

Click the button “Clear Summary.” Conduct at least 200 simulations. What proportion of 
the simulations resulted in type П errors (hover the pointer over the box about “Error” in the 
lower right portion of the display)? How is the proportion of type П errors related to the 
proportion of times that Hp is rejected? 

Change n, the number of trials used for each simulated test, to 30 and leave all other settings 
unchanged. Simulate at least 200 tests. Repeat for n — 50 and n — 100. Click the button 
“Show Summary.” How do the values of £(.6), the probability of a type II error when 
p — .6, change as the sample size increases? 


Leave the window with the summary information open and continue with Exercise 10.12. 
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Applet Exercise Refer to Exercise 10.11. Change о to .1 but keep Ho: p = .5, H,: р # .5 апа 
the true value of p = .6. Simulate at least 200 tests when n = 15. Repeat for n = 30, 50, and 
100. Click on the button “Show Summary.” You will now have two summary tables (it might 
be necessary to drag the last table from on top of the first). Compare the error rates when tests 
are simulated using 15, 30, 50, and 100 trials. 


a Which of the two tests œ = .05 ого = .10 gives the smaller simulated values for В, using 
samples of size 15? 


b Which gives the smaller simulated values for В for each of the other sample sizes? 


Applet Exercise If you were to repeat the instructions of Exercise 10.10, using n = 100 instead 
of n — 30, what would you expect to be similar? What would you expect to be different? 


Applet Exercise Refer to Exercise 10.9. Setup the applet to test Ho : p = .lversus Н, : p < .1 
by clicking the radio button “Lower” in the line labeled “Тап” and adjusting the hypothesized 
value to .1. Set the true value of p = .1, п = 5, anda = .20. 


a Click the button "Draw Sample" until you obtain a sample with zero successes. What is 
the value of z? What is the smallest possible value for z? Is it possible that you will get a 
sample so that the value of z falls in the rejection region? What does this imply about the 
probability that the “large sample" test procedure will reject the null hypothesis? Does this 
result invalidate the use of large sample tests for a proportion? 

b Will the test from part (a) reject the true null approximately 20% of the time if we use 
n — 10? Try it by simulating at least 100 tests. What proportion of the simulations result 
in rejection of the null hypothesis? 

с Look through the values of р in the table under the normal curve and identify the value of 
D for which the null is rejected. Use the tables in the appendix to compute the probability 
of observing this value when n — 10 and p — .1. Is this value close to .2? 


d Isn = 100 large enough so that the simulated proportion of rejects is close to .2? Simulate 
at least 100 tests and give your answer based on the simulation. 


Applet Exercise Refer to Exercise 10.10. Click the button “Clear Summary" to delete the 
results of any previous simulations. Change the sample size for each simulation to n — 30 
and set up the applet to simulate testing Ho: p = .4 versus Н,:р > .4 at the .05 level of 
significance. 


a Click the button “Clear Summary” to erase the results or any previous simulations. Set 
the real value of p to .4 and implement at least 200 simulations. What is the percentage 
simulated tests that result in rejecting the null hypothesis? Does the test work as you 
expected? 

b Leave all settings as they were in part (a) but change the real value of p to .5. Simulate 
at least 200 tests. Repeat when the real value of p is .6 and .7. Click the button “Show 
Summary.” What do you observe about the rejection rate as the true value of p gets further 
from .4 and closer to 1? Does the pattern that you observe match your impression of how 
a good test should perform? 


Applet Exercise Refer to Exercise 10.15. Again, we wish to assess the performance of the 
test for Но: p = .4 versus H, : p > .4 at the .05 level of significance using samples of size 30. 


a Ifthe true value of p is .3, is accepting the alternative hypothesis a correct or incorrect 
decision? 

b Click the button “Clear Summary.” Change the real value of p to .3 and simulate at 
least 200 tests. What fraction of the simulations resulted in accepting the alternative 
hypothesis? 
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c Change the real value of p to .2 and simulate at least 200 tests. Click the button “Show 
Summary.” Does anything look wrong? 


A survey published in the American Journal of Sports Medicine? reported the number of 
meters (m) per week swum by two groups of swimmers—those who competed exclusively in 
breaststroke and those who competed in the individual medley (which includes breaststroke). 
The number of meters per week practicing the breaststroke was recorded for each swimmer, and 
the summary statistics are given below. Is there sufficient evidence to indicate that the average 
number of meters per week spent practicing breaststroke is greater for exclusive breaststrokers 
than it is for those swimming individual medley? 


Specialty 


Exclusively Breaststroke — Individual Medley 


Sample size 130 80 
Sample mean (m) 9017 5853 
Sample standard deviation (m) 7162 1961 
Population mean Ш Ha 

a State the null and alternative hypotheses. 

b What is the appropriate rejection region for an œ = .01 level test? 

c Calculate the observed value of the appropriate test statistic. 

d = What is your conclusion? 

e Whatis a practical reason for the conclusion you reached in part (d)? 


The hourly wages in a particular industry are normally distributed with mean $13.20 and 
standard deviation $2.50. A company in this industry employs 40 workers, paying them an 
average of $12.20 per hour. Can this company be accused of paying substandard wages? Use 
an o = .01 level test. 


The output voltage for an electric circuit is specified to be 130. A sample of 40 independent 
readings on the voltage for this circuit gave a sample mean 128.6 and standard deviation 2.1. 
Test the hypothesis that the average output voltage is 130 against the alternative that it is less 
than 130. Use a test with level .05. 


The Rockwell hardness index for steel is determined by pressing a diamond point into the 
steel and measuring the depth of penetration. For 50 specimens of an alloy of steel, the Rock- 
well hardness index averaged 62 with standard deviation 8. The manufacturer claims that this 
alloy has an average hardness index of at least 64. Is there sufficient evidence to refute the 
manufacturer's claim at the 1% significance level? 


Shear strength measurements derived from unconfined compression tests for two types of soils 
gave the results shown in the following table (measurements in tons per square foot). Do the 
soils appear to differ with respect to average shear strength, at the 1% significance level? 


Soil Type I Soil Type II 
n; — 30 n, = 35 

y; = 1.65 Уз = 1.43 
sı = 0.26 52 = 0.22 


2. Source: Kurt Grote, T. L. Lincoln, and J. G. Gamble, “Hip Adductor Injury in Competitive Swimmers,” 
American Journal of Sports Medicine 32(1) (2004): 104. 
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In Exercise 8.66, we examined the results of a 2001 study by Leonard, Speziale and Pernick 
comparing traditional and activity-oriented methods for teaching biology. Pretests were given 
to students who were subsequently taught by one of the two methods. Summary statistics were 
given for the pretest scores for 368 students who were subsequently taught using the traditional 
method and 372 who were taught using the activity-oriented method. 


a Without looking at the data, would you expect there to be a difference in the mean pretest 
scores for those subsequently taught using the different methods? Based on your conjecture, 
what alternative hypothesis would you choose to test versus the null hypothesis that there 
is no difference in the mean pretest scores for the two groups? 


b Does the alternative hypothesis that you posed in part (a) correspond to a one-tailed or a 
two-tailed statistical test? 

c The mean and standard deviation of the pretest scores for those subsequently taught using 
the traditional method were 14.06 and 5.45, respectively. For those subsequently taught 
using the activity-oriented method, the respective corresponding mean and standard devi- 
ation were 13.38 and 5.59. Do the data provide support for the conjecture that the mean 
pretest scores do not differ for students subsequently taught using the two methods? Test 
using a = .01. 


Studies of the habits of white-tailed deer indicate that these deer live and feed within very 
limited ranges, approximately 150 to 205 acres. To determine whether the ranges of deer 
located in two different geographical areas differ, researchers caught, tagged, and fitted 40 deer 
with small radio transmitters. Several months later, the deer were tracked and identified, and 
the distance y from the release point was recorded. The mean and standard deviation of the 
distances from the release point were as given in the accompanying table.? 


Location 

1 2 
Sample size 40 40 
Sample mean (ft) 2980 3205 
Sample standard deviation (ft) 1140 963 
Population mean Ш Dn 


a If you have no preconceived reason for believing that one population mean is larger than 
the other, what would you choose for your alternative hypothesis? Your null hypothesis? 

b Would your alternative hypothesis in part (a) imply a one-tailed or a two-tailed test? Explain. 
Do the data provide sufficient evidence to indicate that the mean distances differ for the 
two geographical locations? Test using a = .10. 


A study by Children's Hospital in Boston indicates that about 6796 of American adults and about 
15% of children and adolescents are overweight.^ Thirteen children in a random sample of size 
100 were found to be overweight. Is there sufficient evidence to indicate that the percentage 
reported by Children's Hospital is too high? Test at the a = 0.05 level of significance. 


An article in American Demographics reports that 67% of American adults always vote in 
presidential elections.? To test this claim, a random sample of 300 adults was taken, and 192 


3. Source: Charles Dickey, “A Strategy for Big Bucks,” Field and Stream, October 1990. 


4. Source: Judy Holland, *'Cheeseburger Bill’ on the Menu,” Press-Enterprise (Riverside, Calif.), 
March 9, 2004, p. El. 


5. Source: Christopher Reynolds, “Rocking the Vote,” American Demographics, February 2004, p. 48. 
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stated that they always voted in presidential elections. Do the results of this sample provide 
sufficient evident to indicate that the percentage of adults who say that they always vote in 
presidential elections is different than the percentage reported in American Demographics? 
Test using о = .01. 


According to the Washington Post, nearly 45% of all Americans are born with brown eyes, 
although their eyes don’t necessarily stay brown. A random sample of 80 adults found 32 
with brown eyes. Is there sufficient evidence at the .01 level to indicate that the proportion of 
brown-eyed adults differs from the proportion of Americans who are born with brown eyes? 


The state of California is working very hard to ensure that all elementary age students whose 
native language is not English become proficient in English by the sixth grade. Their progress 
is monitored each year using the California English Language Development test. The results 
for two school districts in southern California for the 2003 school year are given in the accom- 
panying table.’ Do the data indicate a significant difference in the 2003 proportions of students 
who are fluent in English for the two districts? Use о = .01. 


District Riverside Palm Springs 
Number of students tested 6124 5512 
Percentage fluent 40 37 


The commercialism of the U.S. space program has been a topic of great interest since Dennis 
Tito paid $20 million to ride along with the Russian cosmonauts on the space shuttle? In a 
survey of 500 men and 500 women, 20% of the men and 26% of the women responded that 
space should remain commercial free. 


a Does statistically significant evidence exist to suggest that there is a difference in the 
population proportions of men and women who think that space should remain commercial 
free? Use a .05 level test. 

b Why is a statistically significant difference in these population proportions of practical 
importance to advertisers? 


A manufacturer of automatic washers offers a model in one of three colors: A, B, or C. Of 
the first 1000 washers sold, 400 were of color A. Would you conclude that customers have a 
preference for color A? Justify your answer. 


A manufacturer claimed that at least 20% of the public preferred her product. A sample of 100 
persons is taken to check her claim. With о = .05, how small would the sample percentage 
need to be before the claim could legitimately be refuted? (Notice that this would involve a 
one-tailed test of the hypothesis.) 


What conditions must be met for the Z test to be used to test a hypothesis concerning a 
population mean и? 


In March 2001, a Gallup poll asked, “Ноу would you rate the overall quality of the environment 
in this country today—as excellent, good, fair or poor?" Of 1060 adults nationwide, 46% gave 
a rating of excellent or good. Is this convincing evidence that a majority of the nation's adults 
think the quality of the environment is fair or poor? Test using о = .05. 


6. Source: "Seeing the World Through Tinted Lenses,” Washington Post, March 16, 1993, p. 5. 


7. Source: Cadonna Peyton, “Pupils Build English Skills,” Press-Enterprise (Riverside, Calif.), March 19, 
2004, p. B-1. 


8. Source: Adapted from “Toplines: To the Moon?" American Demographics, August 2001, p. 9. 
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A political researcher believes that the fraction p; of Republicans strongly in favor of the death 
penalty is greater than the fraction p2 of Democrats strongly in favor of the death penalty. He 
acquired independent random samples of 200 Republicans and 200 Democrats and found 46 
Republicans and 34 Democrats strongly favoring the death penalty. Does this evidence provide 
statistical support for the researcher’s belief? Use о = .05. 


Exercise 8.58 stated that arandom sample of 500 measurements on the length of stay in hospitals 
had sample mean 5.4 days and sample standard deviation 3.1 days. A federal regulatory agency 
hypothesizes that the average length of stay is in excess of 5 days. Do the data support this 
hypothesis? Use a = .05. 


Michael Sosin? investigated determinants that account for individuals’ making a transition from 
having a home (domiciled) but using meal programs to becoming homeless. The following 
table contains the data obtained in the study. Is there sufficient evidence to indicate that the 
proportion of those currently working is larger for domiciled men than for homeless men? Use 
а = .01. 


Homeless Меп Domiciled Меп 


Sample size 112 260 
Number currently working 34 98 


Refer to Exercise 8.68(b). Is there evidence of a difference between the proportion of residents 
favoring complete protection of alligators and the proportion favoring their destruction? Use 
a= .01. 


Calculating Type II Error Probabilities 
and Finding the Sample Size for Z Tests 


Calculating 6 can be very difficult for some statistical tests, but it is easy for the tests 
developed in Section 10.3. Consequently, we can use the Z test to demonstrate both 
the calculation of В and the logic employed in selecting the sample size for a test. 

For the test Ho :0 = 0, versus Н, :0 > 0o, we can calculate type II error probabil- 
ities only for specific values for 0 in H,. Suppose that the experimenter has in mind 
a specific alternative—say, Ө = 0, (where 0, > 00). Because the rejection region is 
of the form 


RR = (0:6 > k}, 
the probability В of a type П error is 
В = P(6 is not in RR when H, is true) 


^ 


0—0, k-—89, 
= 


= Pô <k when =a) = ( ven =) 


9. Source: Michael Sosin, “Homeless and Vulnerable Meal Program Users: A Comparison Study,” Social 
Problems 39(2) (1992). 
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If 0, is the true value of Ө, then (0 — 84) /o has approximately a standard nor- 
mal distribution. Consequently, 6 can be determined (approximately) by finding a 
corresponding area under a standard normal curve. 

For a fixed sample of size п, the size of В depends on the distance between 6, and 
Oo. If 0, is close to 00, the true value of Ө (either 69 or 0) is difficult to detect, and 
the probability of accepting Ho when H; is true tends to be large. If 6, is far from Ө, 
the true value is relatively easy to detect, and 6 is considerably smaller. As we saw in 
Section 10.2, for a specified value of œ, В can be made smaller by choosing a large 
sample size n. 


EXAMPLE 10.8 


Solution 


FIGURE 10.5 
Rejection region for 
Example 10.8 

(К = 15.8225) 


Suppose that the vice president in Example 10.5 wants to be able to detect a difference 
equal to one call in the mean number of customer calls per week. That is, he wishes 
to test Но: u = 15 against H, : и = 16. With the data as given in Example 10.5, find 
P for this test. 


In Example 10.5, we had n = 36, y = 17, and s? = 9. The rejection region for a .05 
level test was given by 
| У = Ho 


E727 


2 > 1.645, 


which is equivalent to 
у – и 1.645 | — о ио + 1.645 | — 
> 1. г > : . 
i Jn d " Jn 


Substituting uo = 15 and n = Зб and using s to approximate с, we find the rejection 
region to be 


3 
y > 15 + 1.645 (=) , or equivalently, y > 15.8225. 
i 4/36 i: 4 


This rejection region is shown in Figure 10.5. Then, by definition, 8 = P(Y < 
15.8225 when jz = 16) is given by the shaded area under the dashed curve to the left 
of К = 15.8225 in Figure 10.5. Thus, for иа = 16, 


Y-us 15.8225 — 16 


= P(Z < —.36) = .3594. 
оўуп ~~ 374/36 pem 


B of 
_ 


н = 15 k — l6-p, 


чё 


= 
Accept Hp Reject Ho 
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The large value of tells us that samples of size n = 36 frequently will fail to 
detect a difference of 1 unit from the hypothesized means. We can reduce the value 
of В by increasing the sample size п. [Їйї 


The preceding example suggests the procedure that an experimenter employs when 
choosing the sample size(s) for an experiment. Suppose that you want to test Ho : u = 
шо versus На: ш > po. If you specify the desired values of о and В (where 6 
is evaluated when и = [lg and Ha > qo), any further adjustment of the test must 
involve two remaining quantities: the sample size n and the point at which the rejection 
region begins, k. Because o and f can be written as probabilities involving n and k, 
we have two equations in two unknowns, which can be solved simultaneously for n. 
Thus, 


a = P(Y > k when и = uo) 


Y-uo К—шщ 
(Le Ema Ho (Z > га) 


B = P(Y < k when p = Ha) 
= Y — us PER 
| NVo/yn ^ oyn 


when и = na) = P(Z < —@8). 
(See Figure 10.5.) 
From the previous equations for o and В, we have 


К-и. a к= NN 
ol fn ^ oj fn | 


Solving both of the above equations for k gives 


снн) nn) 


Thus, 


oO 


(Za + Zp) = ра — цо, Orequivalently, /n = Ga + zp)o 
n 


(иа — Ho) ` 


Sample Size for an Upper-Tail a-Level Test 
п бега) о” 
(Ha — Ho)? 


Exactly the same solution would be obtained for a one-tailed alternative, Н, : и = 
Ша With иа < uo. The method just employed can be used to develop a similar 
formula for sample size for any one-tailed, hypothesis-testing problem that satisfies 
the conditions of Section 10.3. 
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EXAMPLE 10.9 


Solution 


Suppose that the vice president of Example 10.5 wants to test Ho: u = 15 against 
Ha: = 16 witha = В = .05. Find the sample size that will ensure this accuracy. 
Assume that o? is approximately 9. 


Because а = В = .05, it follows that z, = zg = 2.05 = 1.645. Then 


(Za + 2в)202 (1.645 + 1.645)? (9) 
п = == == 


= 97.4. 
(ша = шо)? (16 — 15)? 


Hence, n = 98 observations should be used to meet the requirement thata ~ В ~ .05 
for the vice president’s test. BH 


10.37 


10.38 


10.39 
10.40 


10.41 
10.42 


10.43 


10.44 


Exercises 


Refer to Exercise 10.19. If the voltage falls as low as 128, serious consequences may result. 
For testing Ну: и = 130 versus H, : и = 128, find the probability of a type II error, В, for the 
rejection region used in Exercise 10.19. 


Refer to Exercise 10.20. The steel is sufficiently hard to meet usage requirements if the mean 
Rockwell hardness measure does not drop below 60. Using the rejection region found in 
Exercise 10.20, find £ for the specific alternative и, = 60. 


Refer to Exercise 10.30. Calculate the value of В for the alternative p, = .15. 


Refer to Exercise 10.33. The political researcher should have designed a test for which £ is 
tolerably low when p, exceeds p» by a meaningful amount. For example, find a common 
sample size n for a test with о = .05 and В < .20 when in fact p; exceeds p» by .1. [Hint: The 
maximum value of p(1 — p) is .25.] 


Refer to Exercise 10.34. Using the rejection region found there, calculate В when ш, = 5.5. 


In Exercises 10.34 and 10.41, how large should the sample size be if we require that a = .01 
and В = .05 when u, = 5.5? 


A random sample of 37 second graders who participated in sports had manual dexterity scores 
with mean 32.19 and standard deviation 4.34. An independent sample of 37 second graders 
who did not participate in sports had manual dexterity scores with mean 31.68 and standard 
deviation 4.56. 


a Test to see whether sufficient evidence exists to indicate that second graders who participate 
in sports have a higher mean dexterity score. Use a = .05. 


b For the rejection region used in part (a), calculate В when ш — ш = 3. 


Refer to Exercise 10.43. Find the sample sizes that give = .05 and В = .05 when ш — и = 3. 
(Assume equal-size samples for each group.) 
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Relationships Between Hypothesis-Testing 
Procedures and Confidence Intervals 


Thus far, we have considered two large-sample procedures for making inferences 
about a target parameter 0. In Section 8.6, we observed that if 0 is an estimator for 
0 that has an approximately normal sampling distribution, a two-sided confidence 
interval for Ө with confidence coefficient 1 — o is given by 


0+ Za/20ĝ. 


In this expression, og is the standard error of the estimator 6 (the standard deviation 
of the sampling distribution of Ê), and zy /2 is a number obtained using the standard 
normal table and such that P(Z > гор) = «/2. For large samples, if we were 
interested in an a-level test of Hp : 0 = 00 versus the two-sided alternative H4 : 0 4 Ө, 
the results of the previous section indicate that we would use a Z test based on the 
test statistic 


_ 0—6 


09 


УА 


and would reject Ho if ће value of Z fell in the rejection region {|z| > 20/2). Both 
of these procedures make heavy use of the estimator Ө, its standard error og, and the 
table value z,/2. Let us explore these two procedures more fully. 

The complement of the rejection region associated with any test is sometimes 
called the acceptance region for the test. For any of our large-sample, two-tailed 
a-level tests, the acceptance region is given by RR = {-zy /2 € Z € Za/2}. That is, 
we do not reject Hp :0 = 60 in favor of the two-tailed alternative if 


6 — bo 


6 


—ZaJ2 = < 20/2. 


Restated, the null hypothesis is not rejected (is “ассеріеӣ”) at level o if 
Ê — гарод < Oo < Ê + тузо. 


Notice that the quantities on the far left and far right of the previous string of inequal- 
ities are the lower and upper endpoints, respectively, of a 100(1 — œ)% two-sided 
confidence interval for 0. Thus, a duality exists between our large-sample procedures 
for constructing a 100(1 — w)% two-sided confidence interval and for implement- 
ing a two-sided hypothesis test with level о. Do not reject Но:0 = Ө0 in favor of 
H,:0 4 Op if the value 00 lies inside a 100(1 — w)% confidence interval for Ө. Reject 
Но if 0$ lies outside the interval. Equivalently, a 100(1 — o)96 two-sided confidence 
interval can be interpreted as the set of all values of 00 for which Ho:0 = Op is 
"acceptable" at level о. Notice that any value inside the confidence interval is an ac- 
ceptable value of the parameter. There is not one acceptable value for the parameter 
but many (indeed, the infinite number of values inside the interval). For this reason, 
we usually do not accept the null hypothesis that Ө = 00, even if the value Ө0 falls 
inside our confidence interval. We recognize that many values of 0 are acceptable and 
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refrain from accepting a single Ө value as being the true value. Additional comments 
regarding hypothesis testing are contained in Section 10.7. 

Our previous discussion focused on the duality between two-sided confidence 
intervals and two-sided hypothesis tests. In the exercises that follow this section, you 
will be asked to demonstrate the correspondence between large-sample, one-sided 
hypothesis tests of level о and the construction of the appropriate upper or lower 
bounds with confidence coefficients 1 — o. If you desire an -level test of Ho : 0 = 00 
versus H,:0 > 609 (an upper-tail test), you should accept the alternative hypothesis 
if 09 is less than a 100(1 — o)9?6 lower confidence bound for Ө. If the appropriate 
alternative hypothesis is Ha :0 < 6o (a lower-tail test), you should reject Ho : 0 = 00 
in favor of Н, if 0o is larger than a 100(1 — o)96 upper confidence bound for Ө. 


Exercises 


Refer to Exercise 10.21. Construct a 99% confidence interval for the difference in mean shear 
strengths for the two soil types. 


a Is the value и, — и» = 0 inside or outside this interval? 


b Basedon the interval, should the null hypothesis discussed in Exercise 10.21 be rejected? 
Why? 


c How does the conclusion that you reached compare with your conclusion in Exercise 10.21? 


A large-sample a-level test of hypothesis for Ho:0 = 00 versus H,:0 > 6p rejects the null 
hypothesis if 


Ó — 4% 
06 


> фу. 


Show that this is equivalent to rejecting Ho if 6o is less than the large-sample 100(1 — w)% 
lower confidence bound for Ө. 


Refer to Exercise 10.32. Construct a 95% lower confidence bound for the proportion of the 
nation’s adults who think the quality of the environment is fair or poor. 


a How does the value р = .50 compare to this lower bound? 


b Based on the lower bound in part (a), should the alternative hypothesis of Exercise 10.32 
be accepted? 


с Is there any conflict between the answer in part (b) and your answer to Exercise 10.32? 


A large-sample a-level test of hypothesis for Ho:0 = 00 versus H,:0 < 00 rejects the null 
hypothesis if 


à — 6, 


0$ 


< —2,. 
Show that this is equivalent to rejecting Hp if 0, is greater than the large-sample 100(1 — 0)% 
upper confidence bound for 6. 


Refer to Exercise 10.19. Construct a 95% upper confidence bound for the average voltage 
reading. 
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a How does the value и = 130 compare to this upper bound? 

b Based on the upper bound in part (a), should the alternative hypothesis of Exercise 10.19 
be accepted? 

с Is there апу conflict between the answer in part (b) and your answer to Exercise 10.19? 


Another Way to Report the Results 
of a Statistical Test: Attained 
Significance Levels, or p-Values 


As previously indicated, the probability o of a type I error is often called the signifi- 
cance level, or, more simply, the level of the test. Although small values of o are often 
recommended, the actual value of o to use in an analysis is somewhat arbitrary. One 
experimenter may choose to implement a test with о = .05 whereas another experi- 
menter might prefer о = .01. It is possible, therefore, for two persons to analyze the 
same data and reach opposite conclusions—one concluding that the null hypothesis 
should be rejected at the о = .05 significance level and the other deciding that the 
null hypothesis should not be rejected with o = .01. Further, o-values of .05 or .01 
often are used out of habit or for the sake of convenience rather than as a result of 
careful consideration of the ramifications of making a type I error. 

Once a test statistic (Y in our polling example, or one of the Z's of Section 10.3) 
is decided on, it is often possible to report the p-value or attained significance level 
associated with a test. This quantity is a statistic representing the smallest value of o 
for which the null hypothesis can be rejected. 


If W is a test statistic, the p-value, or attained significance level, is the small- 
est level of significance o for which the observed data indicate that the null 
hypothesis should be rejected. 


The smaller the p-value becomes, the more compelling is the evidence that the null 
hypothesis should be rejected. Many scientific journals require researchers to report 
p-values associated with statistical tests because these values provide the reader 
with more information than is contained in a statement that the null hypothesis was 
rejected or not rejected for some value of о chosen by the researcher. If the p-value 
is small enough to be convincing to you, you should reject the null hypothesis. If 
an experimenter has a value of o in mind, the p-value can be used to implement an 
a-level test. The p-value is the smallest value of o for which the null hypothesis can 
be rejected. Thus, if the desired value of o is greater than or equal to the p-value, 
the null hypothesis is rejected for that value of o. Indeed, the null hypothesis should 
be rejected for any value of о down to and including the p-value. Otherwise, if o is 
less than the p-value, the null hypothesis cannot be rejected. In a sense, the p-value 
allows the reader of published research to evaluate the extent to which the observed 
data disagree with the null hypothesis. Particularly, the p-value permits each reader 
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to use his or her own choice for o in deciding whether the observed data should lead 
to rejection of the null hypothesis. 

The procedures for finding p-values for the tests that we have discussed thus far 
are presented in the following examples. 


EXAMPLE 10.10 


Solution 


FIGURE 10.6 
Illustration of p-value 
for Example 10.10 


Recall our discussion of the political poll (see Examples 10.1 through 10.4) where 
n = 15 voters were sampled. If we wish to test Ho: p = .5 versus H, : p < .5, using 
Y — the number of voters favoring Jones as our test statistic, what is the p-value 
if Y — 3? Interpret the result. 


In previous discussions, we noted that Ho should be rejected for small values of 
Y. Thus, the p-value for this test is given by P{Y < 3}, where Y has a binomial 
distribution with n = 15 and p = .5 (the shaded area in the binomial distribution of 
Figure 10.6). Using Table 1, Appendix 3, we find that the p-value is .018. 

Because the p-value — .018 represents the smallest value of o for which the null 
hypothesis is rejected, an experimenter who specifies any value of a > .018 would 
be led to reject Ho and to conclude that Jones does not have a plurality of the vote. 
If the experimenter chose an o-value of less than .018, however, the null hypothesis 
could not be rejected. 


This example illustrates that the reporting of p-values is particularly beneficial 
when the appropriate test statistic possesses a discrete distribution. In situations like 
these, one often cannot find any rejection region that yields an o-value of a specified 
magnitude. For example, in this instance, no rejection region of the form (y < aj can 
be found for which о = .05. In such cases, reporting the p-value is usually preferable 
to limiting oneself to values of o that can be obtained on the basis of the discrete 
distribution of the test statistic. 

Example 10.10 also indicates the general method for computing p-values. If 
we were to reject Ho in favor of H, for small values of a test statistic W—say, 
RR: {w < kj—the p-value associated with an observed value wọ of W is given by 


p-value = P(W < wo, when Ho is true). 


Analogously, if we were to reject Ho in favor of H, for large values of W—say, 
RR: {w > k}—the p-value associated with the observed value wo is 


p-value = P(W > wo, when Но is true). 


Calculation of a p-value for a two-tailed alternative is illustrated in the following 
example. 
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EXAMPLE 10.11 


Solution 


FIGURE 10.7 
Shaded areas give the 
p-value for 

Example 10.11. 


Find the p-value for the statistical test of Example 10.7. 


Example 10.7 presents a test of the null hypothesis Но: ш — u2 = 0 versus the 
alternative hypothesis H4 : ш — u2 # 0. The value of the test statistic, computed 
from the observed data, was z — —2.5. Because this test is two-tailed, the p-value is 
the probability that either Z « —2.5 or Z > 2.5 (the shaded areas in Figure 10.7). 
From Table 4, Appendix 3, we find that P(Z > 2.5) = P(Z < —2.5) = .0062. 
Because this is a two-tailed test, the p-value = 2(.0062) = .0124. Thus, if a = .05 
(a value larger than .0124), we reject Ho in favor of H, and, in agreement with the 
conclusion of Example 10.7, conclude that evidence of a difference in mean reaction 
time for men and women exists. However, if о = .01 (or any value of a < .0124) 
were chosen, we could not legitimately claim to have detected a difference in mean 
reaction times for the two sexes. 


-2.5 0 2.5 E 


For the statistical tests that we have developed thus far, the experimenter can 
compute exact p-values by using the binomial and Z tables in Appendix 3. The 
applet Normal Probabilities can also be used to compute p-values associated with the 
Z tests discussed in Sections 10.3 and 10.4. Tables (in the appendix) of distributions 
for some of the test statistics that we encounter in later sections give critical values 
only for largely differential values of œ (for example, .10, .05, .025, .01, and .005). 
Consequently, such tables cannot be used to compute exact p-values. However, the 
tables provided in the appendix for the F, t, and x? (and some other) distributions do 
permit us to determine a region of values inside which the p-value is known to lie. For 
example, if a test result is statistically significant for о = .05 but not for о = .025, 
we will report that .025 < p-value < .05. Thus, for any o > .05, we reject the null 
hypothesis; fora < .025, we do not reject the null hypothesis; and for values of o that 
fall between .025 and .05, we need to seek more complete tables of the appropriate 
distribution before reaching a conclusion. The tables in the appendix provide useful 
information about p-values, but the results are usually rather cumbersome. Exact 
p-values associated with test statistics with t, х2, and F distributions are easily 
obtained using the applets introduced in Chapter 7. Many calculators are also capable 
of computing exact p-values. 

The recommendation that a researcher report the p-value for a test and leave its 
interpretation to a reader does not violate the traditional (decision theoretic) statistical 
testing procedures described in the preceding sections. The reporting of a p-value 
simply leaves the decision regarding whether to reject the null hypothesis (with the 
associated potential of committing type I or type II errors) to the reader. Thus, the 
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responsibility of choosing a and, possibly, the problem of evaluating the probability 
P of making a type II error are shifted to the reader. 


Exercises 


High airline occupancy rates on scheduled flights are essential for profitability. Suppose that a 
scheduled flight must average at least 60% occupancy to be profitable and that an examination of 
the occupancy rates for 120 10:00 A.M. flights from Atlanta to Dallas showed mean occupancy 
rate per flight of 58% and standard deviation 11%. Test to see if sufficient evidence exists to 
support a claim that the flight is unprofitable. Find the p-value associated with the test. What 
would you conclude if you wished to implement the test at the о = .10 level? 


Two sets of elementary schoolchildren were taught to read by using different methods, 50 by 
each method. At the conclusion of the instructional period, a reading test yielded the results 
y; = 74, y, = 71, s, = 9, and s; = 10. 


a What is the attained significance level if you wish to see whether evidence indicates a 
difference between the two population means? 


b What would you conclude if you desired an a-value of .05? 


A biologist has hypothesized that high concentrations of actinomycin D inhibit RNA synthesis 
in cells and thereby inhibit the production of proteins. An experiment conducted to test this 
theory compared the RNA synthesis in cells treated with two concentrations of actinomycin D: 
0.6 and 0.7 micrograms per liter. Cells treated with the lower concentration (0.6) of actinomycin 
D yielded that 55 out of 70 developed normally whereas only 23 out of 70 appeared to develop 
normally for the higher concentration (0.7). Do these data indicate that the rate of normal RNA 
synthesis is lower for cells exposed to the higher concentrations of actinomycin D? 


a Find the p-value for the test. 


b If you chose to use о = .05 what is your conclusion? 


How would you like to live to be 200 years old? For centuries, humankind has sought the key 
to the mystery of aging. What causes aging? How can aging be slowed? Studies have focused 
on biomarkers, physical or biological changes that occur at a predictable time in a person’s life. 
The theory is that, if ways can be found to delay the occurrence of these biomarkers, human 
life can be extended. A key biomarker, according to scientists, is forced vital capacity (FVC), 
the volume of air that a person can expel after taking a deep breath. A study of 5209 men and 
women aged 30 to 62 showed that FVC declined, on the average, 3.8 deciliters (dl) per decade 
for men and 3.1 deciliters per decade for women.'® Suppose that you wished to determine 
whether a physical fitness program for men and women aged 50 to 60 would delay aging; to 
do so, you measured the FVC for 30 men and 30 women participating in the fitness program 
at the beginning and end of the 50- to 60-year age interval and recorded the drop in FVC for 
each person. A summary of the data appears in the accompanying table. 


Men Women 


Sample size 30 30 
Sample average drop in FVC (dl) 3.6 2.7 
Sample standard deviation (dl) 1.1 1.2 
Population mean drop in FVC Ш [Lo 


10. Source: T. Boddé, “Biomarkers of Aging: Key to a Younger Life,” Bioscience 31(8) (1981): 566—567. 
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a Do the data provide sufficient evidence to indicate that the decrease in the mean FVC over 
the decade for the men on the physical fitness program is less than 3.8 41? Find the attained 
significance level for the test. 

b Refer to part (a). If you choose a = .05, do the data support the contention that the mean 
decrease in FVC is less than 3.8 dl? 

с Test to determine whether the FVC drop for women on the physical fitness program was 
less than 3.1 dl for the decade. Find the attained significance level for the test. 

d Refer to part (c). If you choose о = .05, do the data support the contention that the mean 
decrease in FVC is less than 3.1 dl? 


Do you believe that an exceptionally high percentage of the executives of large corporations 
are right-handed? Although 85% of the general public is right-handed, a survey of 300 chief 
executive officers of large corporations found that 96% were right-handed. 


a Isthis difference in percentages statistically significant? Test using o = .01. 


b Findthe p-value for the test and explain what it means. 


A check-cashing service found that approximately 5% of all checks submitted to the service 
were bad. After instituting a check-verification system to reduce its losses, the service found 
that only 45 checks were bad in a random sample of 1124 that were cashed. Does sufficient 
evidence exist to affirm that the check-verification system reduced the proportion of bad checks? 
What attained significance level is associated with the test? What would you conclude at the 
a = .01 level? 


A pharmaceutical company conducted an experiment to compare the mean times (in days) 
necessary to recover from the effects and complications that follow the onset of the common 
cold. This experiment compared persons on a daily dose of 500 milligrams (mg) of vitamin C 
to those who were not given a vitamin supplement. For each treatment category, 35 adults were 
randomly selected, and the mean recovery times and standard deviations for the two groups 
were found to be as given in the accompanying table. 


Treatment 


No Supplement 500 mg Vitamin C 


Sample size 35 35 
Sample mean 6.9 5.8 
Sample standard deviation 2.9 1.2 


a Do the data indicate that the use of vitamin C reduces the mean time required to recover? 
Find the attained significance level. 


b What would the company conclude at the о = .05 level? 


A publisher of a newsmagazine has found through past experience that 60% of subscribers 
renew their subscriptions. In a random sample of 200 subscribers, 108 indicated that they 
planned to renew their subscriptions. What is the p-value associated with the test that the 
current rate of renewals differs from the rate previously experienced? 


In a study to assess various effects of using a female model in automobile advertising, each of 
100 male subjects was shown photographs of two automobiles matched for price, color, and 
size but of different makes. Fifty of the subjects (group A) were shown automobile 1 with a 
female model and automobile 2 with no model. Both automobiles were shown without the 
model to the other 50 subjects (group B). In group A, automobile 1 (shown with the model) 
was judged to be more expensive by 37 subjects. In group B, automobile 1 was judged to be 
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more expensive by 23 subjects. Do these results indicate that using a female model increases 
the perceived cost of an automobile? Find the associated p-value and indicate your conclusion 
for an о = .05 level test. 


Some Comments on the Theory 
of Hypothesis Testing 


As previously indicated, we can choose between implementing a one-tailed or a two- 
tailed test for a given situation. This choice is dictated by the practical aspects of the 
problem and depends on the alternative value of the parameter 0 that the experimenter 
is trying to detect. If we stood to suffer a large financial loss if Ө were greater than 
0o but not if it were less, we would concentrate our attention on detecting values of 0 
greater than 6). Hence, we would reject in the upper tail of the distribution for the test 
statistics previously discussed. On the other hand, if we were equally interested in 
detecting values of Ө less than or greater than 6), we would employ a two-tailed test. 

The theory of statistical tests of hypotheses (outlined in Section 10.2 and used 
in Section 10.3) is a very clear-cut procedure that enables the researcher either to 
reject or to accept the null hypothesis, with measured risk о or В. Unfortunately, this 
theoretical framework does not suffice for all practical situations. 

For any statistical test, the probability o of a type I error depends on the value of 
the parameter specified in the null hypothesis. This probability can be calculated, at 
least approximately, for each of the testing procedures discussed in this text. For the 
procedures discussed thus far, the probability 8 of atype II error can be calculated only 
after a specific value of the parameter of interest has been singled out for consideration. 
The selection of a practically meaningful value for this parameter is often difficult. 
Even if a meaningful alternative can be identified, the actual calculation of 6 is 
sometimes quite tedious. Specification of a meaningful alternative hypothesis is even 
more difficult for some of the testing procedures that we will consider in subsequent 
chapters. 

Of course, we do not want to ignore the possibility of committing a type II error. 
Later in this chapter, we will determine methods for selecting tests with the smallest 
possible value of В for tests where о, the probability of a type I error, is a fixed value 
selected by the researcher. Even in these situations, however, the smallest possible 
value of 8 can be quite large. 

These obstacles do not invalidate the use of statistical tests; rather, they urge us to be 
cautious about drawing conclusions where insufficient evidence is available to permit 
rejection of the null hypothesis. If a truly meaningful value for 6 can be calculated, 
we should feel justified in accepting Но if the value of В is small and the value of the 
test statistic falls outside the rejection region. In the more typical situation where a 
truly meaningful value for 6 is unavailable, we will modify our procedure as follows. 

When the value of the test statistic is not in the rejection region, we will “fail to 
reject" rather than “accept” the null hypothesis. In the polling example discussed in 
Example 10.1, we tested Ho: p = .5 versus H,: p < .5. If our observed value of 
Y falls into the rejection region, we reject Ho and say that the evidence supports the 
research hypothesis that Jones will lose. In this situation, we will have demonstrated 
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support for the hypothesis we wanted to support—the research hypothesis. If, however, 
Y does not fall in the rejection region and we can determine no specific value of p 
in Н, that is of direct interest, we simply state that we will not reject Но and must 
seek additional information before reaching a conclusion. Alternatively, we could 
report the p-value associated with the statistical test and leave the interpretation to 
the reader. 

If Ho is rejected for a "small" value of o (or for a small p-value), this occurrence 
does not imply that the null hypothesis is “wrong by a large amount.” It does mean that 
the null hypothesis can be rejected based on a procedure that incorrectly rejects the 
null hypothesis (when Hp is true) with a small probability (that is, with a small prob- 
ability of a type I error). We also must refrain from equating statistical with practical 
significance. If we consider the experiment described and analyzed in Examples 10.7 
and 10.11, the p-value of .0124 is "small," and the result is statistically significant for 
any choice of a > .0124. However, the difference between the mean reaction times 
for the two samples is only .2 second, a result that may or may not be practically 
significant. To assess the practical significance of such a difference, you may wish to 
form a confidence interval for ру — u2 by using the methods of Section 8.6. 

Finally, some comments are in order regarding the choice of the null hypotheses that 
we have used, particularly in the one-sided tests. For example, in Example 10.1, we 
identified the appropriate alternative hypothesis as Н, : p < .5 and used Ho: p = .5 
as our null hypothesis. The test statistic was Y — the number of voters who favored 
Jones in a sample of size п = 15. One rejection region that we considered was 
{y < 2}. You might wonder why we did not use Ну: p > .5 as the null hypothesis. 
This makes a lot of sense, because every possible value of p is either in Hj : p > .5 
orin Ha: p < .5. 

So why did we use Ho: p = .5? The brief answer is that what we really care 
about is the alternative hypothesis На: p < .5; the null hypothesis is not our primary 
concern. As previously discussed, we usually do not actually accept the null hypothesis 
anyway, regardless of its form. In addition, Но: p = .5 is easier to deal with and leads 
to exactly the same conclusions at the same a-value without requiring us to develop 
additional theory to deal with the more complicated Hj :p > .5. When we used 
Ho: p = .5 as our null hypotheses, calculating the o-level of the test was relatively 
simple: We just found P(Y < 2 when p = .5). If we had used Hj : p > .5 as the 
null hypothesis, our previous definition of œ would have been inadequate because 
the value of P(Y < 2) is actually a function of p for p > .5. In cases like these, 
a is defined to be the maximum (over all values of p > .5) value of P(Y < 2). 
Although we will not derive this result here, max,- 5 P(Y < 2) occurs when р = .5, 
the “boundary” value of p in Ну: p = .5. Thus, we get the "right" value of o if we 
use the simpler null hypothesis Ho: p — .5. 

Similar statements are true for all of the tests that we have considered thus far and 
that we will consider in future discussions. That is, if we consider Н, : 0 > Qo to 
be the appropriate research hypothesis, œ = maxg<g, P (test statistic in RR) typically 
occurs when Ө = 00, the “boundary” value of 0. Similarly, if Ha : 9 < Oo is the 
appropriate research hypothesis, œ = maxo-a, P (test statistic in RR) typically occurs 
when 0 = o. Thus, using Но:0 = 60 instead of Hj :0 > 60 leads to the correct 
testing procedure and the correct calculation of o, without needlessly raising additional 
considerations. 
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Exercises 


Applet Exercise Use the applet Hypothesis Testing (for Proportions) (refer to Exercises 10.9— 
10.16) to complete the following. Set up the applet to simulate the results of tests of Ho: p = .8 
versus H, : p > .8,usingo = .2andsamples of size n = 30. Click the button “Clear Summary” 
to erase the results of any previous simulations. 


a Setthe true value of p to .8 and implement at least 200 simulated tests. What proportion 
of simulations results in rejection of the null hypothesis? 

b Leave all settings at their previous values except change the true value of p to .75. Implement 
atleast 200 simulated tests and observe the proportion of the simulations that led to rejection 
of the null hypothesis. Repeat, setting the true value of p to .7 and again with the true value 
of p = .65. 

с What would you expect to happen if the simulation was repeated after setting the true value 
of p to any value less than .65? Try it. 

d Click the button “Show Summary.” Which of the true p's used in the simulations resulted 
in the largest proportion of simulated test that rejected the null and accepted the alternative, 
На: p > .8? Does this confirm any statements made in the last paragraph of Section 10.7? 
Which statement? 


Applet Exercise Refer to Exercise 10.59. Set up the applet to simulate the results of tests of 
Ho: p = .4 versus H,:p < .4, using о = .2 and samples of size n = 30. Click the button 
“Clear Summary” to erase the results of any previous simulations. 


a Setthe true value of p to .4 and implement at least 200 simulated tests. What proportion 
of simulations result in rejection of the null hypothesis? 

b Leave all setting at their previous values except change the true value of p to .45. Implement 
at least 200 simulated tests and observe the proportion of the simulations that led to rejection 
of the null hypothesis. Repeat, setting the true value of p to .5, then to .55. 

c What would you expect to happen if the simulation was repeated after setting the true value 
of p to any value greater than .55? Try it. 

d Click the button “Show Summary.” Which of the true p's used in the simulations resulted in 
the largest proportion of simulated tests that rejected the null and accepted the alternative, 
H,:p < .4? Does this confirm any statements made in the last paragraph of Section 10.7? 
Which statement? 


Small-Sample Hypothesis Testing 
for ш апа шу — шэ 


In Section 10.3, we discussed large-sample hypothesis testing procedures that, like the 
interval estimation procedures developed in Section 8.6, are useful for large samples. 
For these procedures to be applicable, the sample size must be large enough that 
Z = (8 — 00) /og has approximately a standard normal distribution. Section 8.8 
contains procedures based on the f distribution for constructing confidence intervals 
for u (the mean of a single normal population) and шу — m2 (the difference in the 
means of two normal populations with equal variances). In this section, we develop 
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formal procedures for testing hypotheses about ш and ш — u2, procedures that are 
appropriate for small samples from normal populations. 

We assume that Y;, Yo,..., Y, denote a random sample of size п from a normal 
distribution with unknown mean у and unknown variance o?. If Y and S denote 
the sample mean and sample standard deviation, respectively, and if Ho: и = Шо is 
true, then 


= Y — ио 
© S/4n 


has a t distribution with n — 1 df (see Section 8.8). 

Because the г distribution is symmetric and mound-shaped, the rejection region for 
a small-sample test of the hypothesis Ho : и = uo must be located in the tails of the 7 
distribution and be determined in a manner similar to that used with the large-sample 
Z statistic. By analogy with the Z test developed in Section 10.3, the proper rejection 
region for the upper-tail alternative Ha : ш > po is given by 


RR = {t > ta}, 


where f, is such that P{T > tą} = а for a t distribution with n — 1 df (see Table 5, 
Appendix 3). 

A summary of the tests for jz based on the г distribution, known as t tests, is as 
follows. 


A Small-Sample Test for ш 


Assumptions: Y;, Y2,..., Y, constitute a random sample from a normal 
distribution with E(Y;) = и. 
Но: u = шо. 
ш> uo (upper-tail alternative). 
lel, W ff < fn (lower-tail alternative). 
hx Шо (two-tailed alternative). 
y= 
Test statistic: T = О 
S/J/n 
fi $ fs (upper-tail RR). 
Rejection region: 1 f < — 1, (lower-tail RR). 


|t| > 2 (two-tailed RR). 
(See Table 5, Appendix 3, for values of t4, with v = n — 1 df.) 


EXAMPLE 10.12 


Example 8.11 gives muzzle velocities of eight shells tested with a new gunpowder, 
along with the sample mean and sample standard deviation, y — 2959 and s — 39.1. 
The manufacturer claims that the new gunpowder produces an average velocity of 
not less than 3000 feet per second. Do the sample data provide sufficient evidence to 
contradict the manufacturer's claim at the .025 level of significance? 
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Assuming that muzzle velocities are approximately normally distributed, we can 
use the test just outlined. We want to test Но: ш = 3000 versus the alternative, 
На: < 3000. The rejection region is given by t < —to25 = —2.365, where t 
possesses v — (n — 1) — 7 df. Computing, we find that the observed value of the test 
statistic is 
, У = шо 2959 — 3000 
s/Jn 39.1/ 4/8 


This value falls in the rejection region (that is, t = —2.966 is less than —2.365); hence, 
the null hypothesis is rejected at the о = .025 level of significance. We conclude that 
sufficient evidence exists to contradict the manufacturer's claim and that the true mean 
muzzle velocity is less than 3000 feet per second at the .025 level of significance. Ё 


2.966. 


EXAMPLE 10.13 


Solution 


FIGURE 10.8 
Bounding the p-value 
for Example 10.13, 
using Table 4, 
Appendix 3 


What is the p-value associated with the statistical test in Example 10.12? 


Because the null hypothesis should be rejected if t is “small,” the smallest value of a 
for which the null hypothesis can be rejected is p-value — P(T « —2.966), where 
T has a t distribution with n — 1 — 7 df. 

Unlike the table of areas under the normal curve (Table 4, Appendix 3), Table 5 in 
Appendix 3 does not give areas corresponding to many values of т. Rather, it gives the 
values of ¢ corresponding to upper-tail areas equal to .10, .05, .025, .010, and .005. 
Because the ¢ distribution is symmetric about 0, we can use these upper-tail areas 
to provide corresponding lower-tail areas. In this instance, the ¢ statistic is based on 
7 df; hence, we consult the df = 7 row of Table 5 and find that —2.966 falls between 
—1025 = —2.365 and —тоу = —2.998. These values are indicated in Figure 10.8. 
Because the observed value of T (—2.966) is less than — 925 = —2.376 but not less 
than —£9; = —2.998, we reject Ho for a = .025 but not fora = .01. Thus, the 
p-value for the test satisfies .01 < p-value < .025. 

The exact p-value is easily obtained using the applet Student's t Probabilities and 
Quantiles (accessible at www.thomsonedu.com/statistics/wackerly). Using the applet 
with 7 df, we obtain p-value = P(T « —2.966) = P(T > 2.966) = .01046, a value 
that 1s indeed between .01 and .025. Thus, the data indicate that the manufacturer's 
claim should be rejected for any choice of a > .01046. 


p-value 


025 -2.365 а 
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A second application of the t distribution is in constructing a small-sample test to 
compare the means of two normal populations that possess equal variances. Suppose 
that independent random samples are selected from each of two normal populations: 
Yu. Үр, ..., Yin, from the first and №1, Yoo, ..., Yan, from the second, where the 
mean and variance of the ith population are u; and o?, for i = 1, 2. Further, assume 
that Y; and e fori — 1, 2, arethe corresponding sample means and variances. When 
these assumptions are satisfied, we showed in Section 8.8 that if 


(п = 1)8? + (по - DS? 


Ко 
2 пу+п›—2 


is the pooled estimator for o, then 


T- (Yi — Y2) — (ш — из) 


nj n3 
has a Student's t distribution with nı + n? — 2 df. If we want to test the null hypoth- 
esis Ho: ші — u2 = Do for some fixed value Do, it follows that, if Ho is true, then 


Y, —-Y,-D 
Tal 2 0 
s E. 
"V ny n» 


has a Student's ¢ distribution with n, + n5 — 2 df. Notice that this small-sample test 
statistic resembles its large-sample counterpart, the Z-statistic of Section 10.3. Tests 
of the hypothesis Но: ш — u2 = Do versus upper-tail, lower-tail, and two-tailed 
alternatives are conducted in the same manner as in the large-sample test except that 
we employ the ft statistic and tables of the t distribution to reach our conclusions. A 
summary of the small-sample testing procedures for шу — u2 follows. 


Small-Sample Tests for Comparing Two Population Means 


Assumptions: Independent samples from normal distributions with o? = D 


Ho: ш — ио = Do. 
Ha = ua > Do (upper-tail alternative). 
Ha: 4 ш — ро < Do (lower-tail alternative). 


ш = ua # Do (two-tailed alternative). 


Yi- Y- D = 1)52 — fg 
Test statistic: T = >", where Sp = | ТЕЕ Ба 
4 1 


S И de п +nz—2 
PV n n 
DUTY (upper-tail RR). 
Rejection region: 1 і < —/, (lower-tail RR). 


|| > ш (two-tailed RR). 


Here, P(T > ty) = а and degrees of freedom v = n, + п» — 2. (See 
Table 5, Appendix 3.) 
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EXAMPLE 10.14 


Solution 


Example 8.12 gives data on the length of time required to complete an assembly 
procedure using each of two different training methods. The sample data are as 
shown in Table 10.3. Is there sufficient evidence to indicate a difference in true mean 
assembly times for those trained using the two methods? Test at the о = .05 level of 
significance. 


Table 10.3 Data for Example 10.14 


Standard Procedure New Procedure 
п = 9 n» = 9 
y, = 35.22 seconds У = 31.56 seconds 
У? Ou — 91)? = 195.56 Y? (ух – F2)? = 16022 


We аге testing Но: (шу — ио) = O against the alternative H, : (ш — мә) Æ 0. 
Consequently, we must use a two-tailed test. The test statistic is 
т (Y, — Y2) – Do 
S 1 + X 
"Vni m 
with Do = 0, and the rejection region for о = .05 is |t| > уо = 1025. In this case, 
1025 = 2.120 because 7 is based on (n; + по — 2) = 9 + 9 — 2 = 16 df. 
The observed value of the test statistic is found by first computing 


joi 880 imn Toe. IBID aie 
2 Р 9+9—2 


Then, 


This value does not fall in the rejection region (|t| > 2.120); hence, the null hypothesis 
is not rejected. There is insufficient evidence to indicate a difference in the mean 
assembly times for the two training periods at the о = .05 level of significance. 
Notice that, in line with the comments of Section 10.7, we have not accepted 
Ho: ш — u2 = 0. Rather, we have stated that we lack sufficient evidence to reject 
Но and to accept the alternative H, : ш — U2 zz 0. E 


EXAMPLE 10.15 


Solution 


Find the p-value for the statistical test of Example 10.14. 


The observed value of the test statistic for this two-tailed test was tf = 1.65. The 
p-value for this test is thus the probability that T > 1.65 or T < —1.65, the areas 
shaded in Figure 10.9—that is, Ay + A». 

Because this test statistic is based on n, + n2 — 2 = 16 df, we consult Table 5, 
Appendix 3, to find tos = 1.746 and туо = 1.337. Thus, A, = P(T > 1.65) lies 


FIGURE 10.9 
Shaded areas are 
the p-value for 
Example 10.15 
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Аз А, 
| 
—1.65 0 1.65 4 05 
10. 1.746 
1.337 


between .05 and .10; that is, .05 < A; < .1. Similarly, .05 < А» < .1. Because the 
p-value = A, + Аз, it follows that .1 < p-value < .2. 

The applet Student’s t Probabilities and Quantiles yields that, with 16 df, Ay = 
P(T > 1.65) = .0592 = А» and that the exact p-value = .1184. Thus, the smallest 
value of o for which the data indicate a difference in the mean assembly times for 
those trained using the two methods is .1184. 

Whether the p-value is determined exactly using the applet or bounded using 
Table 5, Appendix 3, if we select о = .05, we cannot reject the null hypothesis. 
This is the same conclusion that we reached in Example 10.14 where we formally 
implemented the .05 level test. E 


The test of Example 10.12 is based on the assumption that the muzzle velocity 
measurements have been randomly selected from a normal population. In most cases, 
it is impossible to verify this assumption. We might ask how this predicament affects 
the validity of our conclusions. 

Empirical studies of the test statistic 


Y-u 
S/ An 


have been conducted by sampling from many populations with nonnormal distribu- 
tions. Such investigations have shown that moderate departures from normality in the 
distribution of the population have little effect on the probability distribution of the 
test statistic. This result, coupled with the common occurrence of near-normal dis- 
tributions of data in nature, makes the ¢ test of a population mean extremely useful. 
Statistical tests that lack sensitivity to departures from the assumptions upon which 
they are based possess wide applicability. Because of their insensitivity to formal 
assumption violations, they are called robust statistical tests. 

Like the ¢ test for a single mean, the ft test for comparing two population means 
(often called the two-sample t test) is robust relative to the assumption of normality. 
Itis also robust relative to the assumption that o? = оў when n, and n3 are equal (or 
nearly equal). 

Finally, the duality between tests and confidence intervals that we considered in 
Section 10.6 holds for the tests based on the ¢ distributions that we considered in this 
section and the confidence intervals presented in Section 8.8. 


526 Chapter 10 


10.61 
10.62 


10.63 


10.64 


10.65 


10.66 


Hypothesis Testing 


Exercises 


Why is the Z test usually inappropriate as a test procedure when the sample size is small? 


What assumptions are made when a Student's t test is employed to test a hypothesis involving 
a population mean? 


A chemical process has produced, on the average, 800 tons of chemical per day. The daily 
yields for the past week are 785, 805, 790, 793, and 802 tons. 


a Do these data indicate that the average yield is less than 800 tons and hence that something 
is wrong with the process? Test at the 5% level of significance. What assumptions must be 
satisfied in order for the procedure that you used to analyze these data to be valid? 

Use Table 5, Appendix 3, to give bounds for the associated p-value. 


Applet Exercise Use the applet Student’s t Probabilities and Quantiles to find the exact 
p-value. Does the exact p-value satisfy the bounds that you obtained in part (b)? 

d Use the p-value from part (c) to decide at the 5% significance level whether something 
is wrong with the process. Does your conclusion agree with the one that you reached in 
part (a)? 


A coin-operated soft-drink machine was designed to discharge on the average 7 ounces of 
beverage per cup. In a test of the machine, ten cupfuls of beverage were drawn from the 
machine and measured. The mean and standard deviation of the ten measurements were 7.1 
ounces and .12 ounce, respectively. Do these data present sufficient evidence to indicate that 
the mean discharge differs from 7 ounces? 


a What can be said about the attained significance level for this test based on the ¢ table in 
the appendix? 

b Applet Exercise Find the exact p-value by using the applet Student's t Probabilities and 
Quantiles. 


c What is the appropriate decision if о = .10? 


Operators of gasoline-fueled vehicles complain about the price of gasoline in gas stations. 
According to the American Petroleum Institute, the federal gas tax per gallon is constant (18.4¢ 
as of January 13, 2005), but state and local taxes vary from 7.5¢ to 32.10¢ for n = 18 key 
metropolitan areas around the country.!! The total tax per gallon for gasoline at each of these 18 
locations is given next. Suppose that these measurements constitute a random sample of size 18: 


42.89 53.91 48.55 47.90 47.73 46.61 
40.45 39.65 38.65 37.95 36.80 35.95 
35.09 35.04 34.95 33.45 28.99 27.45 


a Is there sufficient evidence to claim that the average per gallon gas tax is less than 45¢? 
Use the т table in the appendix to bound the p-value associated with the test. 


Applet Exercise What is the exact p-value? 
Construct a 95% confidence interval for the average per gallon gas tax in the United States. 
Researchers have shown that cigarette smoking has a deleterious effect on lung function. In 


their study of the effect of cigarette smoking on the carbon monoxide diffusing capacity (DL) 
of the lung, Ronald Knudson, W. Kaltenborn and B. Burrows found that current smokers had 


11. Source: “Gasoline Tax Rates by State,” hitp:/Avww.gaspricewatch.com/usgastaxes.asp, 13 January 
2005. 
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DL readings significantly lower than either ex-smokers or nonsmokers.!? The carbon monoxide 
diffusing capacity for a random sample of current smokers was as follows: 


103.768 88.602 73.003 123.086 91.052 

92.205 61.675 90.677 84.023 76.014 
100.615 88.017 71.210 82.115 89.222 
102.754 108.579 73.154 106.755 90.479 


Do these data indicate that the mean DL reading for current smokers is lower than 100, the 
average DL reading for nonsmokers? 


a Testatthe o — .01 level. 
b Bound the p-value using a table in the appendix. 
c Applet Exercise Find the exact p-value. 


Nutritional information provided by Kentucky Fried Chicken (KFC) claims that each small 
bag of potato wedges contains 4.8 ounces of food and 280 calories. A sample of ten orders 
from KFC restaurants in New York and New Jersey averaged 358 calories. "° 


a Ifthe sample standard deviation was s — 54, is there sufficient evidence to indicate that the 
average number of calories in small bags of KFC potato wedges is greater than advertised? 
Test at the 1% level of significance. 

b Construct a 9996 lower confidence bound for the true mean number of calories in small 
bags of KFC potato wedges. 


c Onthebasis ofthe bound you obtained in part (b), what would you conclude about the claim 
that the mean number of calories exceeds 280? How does your conclusion here compare 
with your conclusion in part (a) where you conducted a formal test of hypothesis? 


What assumptions are made about the populations from which independent random samples 
are obtained when the f distribution is used to make small-sample inferences concerning the 
differences in population means? 


Two methods for teaching reading were applied to two randomly selected groups of elementary 
schoolchildren and then compared on the basis of a reading comprehension test given at the 
end of the learning period. The sample means and variances computed from the test scores are 
shown in the accompanying table. 


MethodI Method II 


Number of children in group 11 14 
y 64 69 
g^ 52 71 


Do the data present sufficient evidence to indicate a difference in the mean scores for the 
populations associated with the two teaching methods? 


a What can be said about the attained significance level, using the appropriate table in the 
appendix? 


12. Source: Ronald Knudson, W. Kaltenborn, and B. Burrows, “The Effects of Cigarette Smoking and 
Smoking Cessation on the Carbon Monoxide Diffusing Capacity of the Lung in Asymptomatic Subjects,” 
American Review of Respiratory Diseases 140 (1989) 645—651. 

13. Source: “КЕС: Too Finger-Lickin’ Good?;" Good Housekeeping Saavy Consumer Product Tests, 
http://magazines.ivillage.com/goodhousekeeping/print/0,,446041,00.html, 11 March 2004. 
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b Applet Exercise What can be said about the attained significance level, using the appro- 
priate applet? 

c What assumptions are required? 

d What would you conclude at the о = .05 level of significance? 


A study was conducted by the Florida Game and Fish Commission to assess the amounts of 
chemical residues found in the brain tissue of brown pelicans. In a test for DDT, random samples 
of n, = 10 juveniles and n = 13 nestlings produced the results shown in the accompanying 
table (measurements in parts per million, ppm). 


Juveniles Nestlings 
n; — 10 по = 13 

y, = .041 y, = .026 
81 = .017 52 = .006 


а Test the hypothesis that mean amounts of DDT found in juveniles and nestlings do not 
differ versus the alternative, that the juveniles have a larger mean. Use о = .05. (This test 
has important implications regarding the accumulation of DDT over time.) 


b Isthereevidencethatthe mean for juveniles exceeds that for nestlings by more than .01 ppm? 


i Bound the p-value, using a table in the appendix. 


ii Applet Exercise Find the exact p-value, using the appropriate applet. 


Under normal conditions, is the average body temperature the same for men and women? 
Medical researchers interested in this question collected data from a large number of men and 
women, and random samples from that data are presented in the accompanying table.'* Is there 
sufficient evidence to indicate that mean body temperatures differ for men and women? 


Body Temperatures (^F) 


Men Women 
96.9 97.8 
97.4 98.0 
97.5 98.2 
97.8 98.2 
97.8 98.2 
97.9 98.6 
98.0 98.8 
98.6 99.2 
98.8 99.4 


a Bound the p-value, using a table in the appendix. 
b Applet Exercise Compute the p-value. 


An Article in American Demographics investigated consumer habits at the mall. We tend to 
spend the most money when shopping on weekends, particularly on Sundays between 4:00 
and 6:00 P.M. Wednesday-morning shoppers spend the least.'? Independent random samples 


14. Source: Journal of Statistics Education Data Archive, http://www.amstat.org/publications/jse/jse- 
data-archive.html, March 2006. 


15. Source: John Fetto, "Shop Around the Clock,” American Demographics September 2003, р. 18. 
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of weekend and weekday shoppers were selected and the amount spent per trip to the mall was 
recorded as shown in the following table: 


Weekends Weekdays 
nj = 20 n, = 20 

y, = $78 y, = $67 
51 = $22 $5 = $20 


а 15 there sufficient evidence to claim that there is а difference in the average amount spent 
per trip on weekends and weekdays? Use o = .05. 


b What is the attained significance level? 


In Exercise 8.83, we presented some data collected in a study by Susan Beckham and her 
colleagues. In this study, measurements were made of anterior compartment pressure (in mil- 
limeters of mercury) for ten healthy runners and ten healthy cyclists. The data summary is 
repeated here for your convenience. 


Runners Cyclists 
Condition Mean 5 Меап 5 
Кезї 14.5 3.92 11.1 3.98 
80% maximal О» 12.2 3.49 11.5 4.95 


consumption 


a Is there sufficient evidence to justify claiming that a difference exists in mean compartment 
pressures for runners and cyclists who are resting? Use о = .05. Bound or determine the 
associated p-value. 

b Does sufficient evidence exist to permit us to identify a difference in mean compartment 
pressures for runners and cyclists at 80% maximal O, consumption? Use œ = .05. Bound 
or determine the associated p-value. 


Refer to Exercise 8.88. A report from a testing laboratory claims that, for these species of fish, 
the average LC50 measurement is 6 ppm. Use the data of Exercise 8.88 to determine whether 
sufficient evidence exists to indicate that the average LC50 measurement is less than 6 ppm. 
Use a = .05. 


The tremendous growth of the Florida lobster (called spiny lobster) industry over the past 
20 years has made it the state's second most valuable fishery industry. A declaration by the 
Bahamian government that prohibits U.S. lobsterers from fishing on the Bahamian portion of 
the continental shelf was expected to reduce dramatically the landings in pounds per lobster 
trap. According to the records, the prior mean landings per trap was 30.31 pounds. A random 
sampling of 20 lobster traps since the Bahamian fishing restriction went into effect gave the 
following results (in pounds): 


17.5 18.9 39.6 344 19.6 
337 37.2 43.4 417 27.5 
24.1 39.6 12.2 25.5 221 
29.3 21.1 23.8 43.2 244 


Do these landings provide sufficient evidence to support ће contention that the mean landings 
per trap has decreased since imposition of the Bahamian restrictions? Test using о = .05. 
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Jan Lindhe conducted a study!® on the effect of an oral antiplaque rinse on plaque buildup on 
teeth. Fourteen subjects, whose teeth were thoroughly cleaned and polished, were randomly 
assigned to two groups of seven subjects each. Both groups were assigned to use oral rinses (no 
brushing) for a 2-week period. Group 1 used a rinse that contained an antiplaque agent. Group 2, 
the control group, received a similar rinse except that, unknown to the subjects, the rinse con- 
tained no antiplaque agent. A plaque index y, a measure of plaque buildup, was recorded at 4, 
7, and 14 days. The mean and standard deviation for the 14-day plaque measurements for the 
two groups are given in the following table: 


Control Group Antiplaque Group 


Sample size 7 7 
Mean 1.26 78 
Standard deviation ‚32 ‚32 


a State the null and alternative hypotheses that should be used to test the effectiveness of the 
antiplaque oral rinse. 

b Оо the data provide sufficient evidence to indicate that the oral antiplaque rinse is effective? 
Test using о = .05. 

c Bound or find the p-value for the test. 

In Exercise 8.90, we presented a summary of data regarding SAT scores (verbal and math) for 


high school students who intended to major in engineering or in language and literature. The 
data are summarized in the following table: 


Prospective Major Verbal Math 
Engineering (n = 15) y=446 5=42 у= 548 5-57 
Language/literature (n = 15) y = 534 5=45 у= 517 5 = 52 


a Isthere sufficient evidence to indicate a difference in mean verbal SAT scores for high school 
students intending to major in engineering and in language/literature? Bound or determine 
the associated p-value. What would you conclude at the о = .05 significance level? 

b Are the results you obtained in part (a) consistent with those you obtained in 
Exercise 8.90(a)? 

c Answer the questions posed in part (a) in relation to the mean math SAT scores for the two 
groups of students. 

d Are the results you obtained in part (c) consistent with those you obtained in Exer- 
cise 8.90(b)? 


Testing Hypotheses Concerning Variances 


We again assume that we have а random sample Yj, Y2,..., Ү, from a normal dis- 
tribution with unknown mean ш and unknown variance o?. In Section 8.9, we used 
the pivotal method to construct a confidence interval for the parameter o. In this 
section, we consider the problem of testing Hj) : o? = оў for some fixed value og 


16. Source: Jan Lindhe, “Clinical Assessment of Antiplaque Agents,’ Compendium of Continuing Edu- 
cation in Dentistry, supl. no. 5, 1984. 


FIGURE 10.10 
Rejection regions 
RR for testing 
Ho:o? = сё versus 
(а) H,:07 > оў; 
(b) H,:o? < оў; 


and (c) H,:o? #04 


10.9 Testing Hypotheses Concerning Variances 531 


versus various alternative hypotheses. If Ho is true and o? = Ons Theorem 7.3 implies 
that 
2 т—1)$ 2 
p=; 
90 


hasa х? distribution with n — 1 df. If we desire to test Но: 02 = оў versus H,:0? > 


оў, we сап use x? = (п— Ds? / od as our test statistic, but how should we select the 
rejection region RR? 

If H, is true and the actual value of о? is larger than оў, we would expect 52 
(which estimates the true value of o?) to be larger than Og. The larger 52 is relative 
to оў, the stronger is the evidence to support H4 : o? > оў. Notice that S? is large 
relative to оў if and only if x? = (n — 1)52/ od is large. Thus, we see that a rejection 
region of the form RR = (x? > К} for some constant К is appropriate for testing 
Hy:o? = оў versus H, : 0° > оў. If we desire a test for which the probability of a 
type I error is о, we use the rejection region 


RR = {x7 > хе), 


where P(x? > x2) = a. (Values of x2 can be found in Table 6, Appendix 3.) An 
illustration of this rejection region is found in Figure 10.10(a). 

If we want to test Ну:о? = od versus H,:0? < es (a lower-tail alternative), 
analogous reasoning leads to a rejection region located in the lower tail of the x? 
distribution. Alternatively, we can test Ho: о? = оў versus H; : o° A оў (a two- 
tailed test) by using a two-tailed rejection region. Graphs illustrating these rejection 


regions are given in Figure 10.10. 


(a) (b) 


O|RR ВВ 
-a/2 Xin 


(с) 
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Test of Hypotheses Concerning a Population Variance 


Assumptions: Y;, Y2,..., Y, constitute a random sample from a normal 
distribution with 


B= оо 


т? 
Ho:o = 00 


а= oe (upper-tail alternative). 
IL SM gi? < a (lower-tail alternative). 
о? ES 95 (two-tailed alternative). 
Test statistic: x? — ы 
00 
И” > йр (upper-tail КЕ). 
Rejection region: 4 X =, (lower-tail КЕ). 
Ж” > Kem or x? < Ж ый (two-tailed RR). 


Notice that x2 is chosen so that, for v = n — 1 df, P(x? > x2) =a. 
(See Table 6, Appendix 3.) 


A company produces machined engine parts that are supposed to have a diameter 
variance no larger than .0002 (diameters measured in inches). A random sample of 
ten parts gave a sample variance of .0003. Test, at the 5% level, Ho:o? = .0002 
against Н, :o? > .0002. 


If it is reasonable to assume that the measured diameters are normally distributed, the 
appropriate test statistic is x? = (n — 1)52/ оў. Because we have posed ап upper-tail 
test, we reject Ho for values of this statistic larger than X35 = 16.919 (based on 9 df). 
The observed value of the test statistic is 


(n — 1)s? _ (9)(.0003) 


= = 13.5. 
od 0002 


Thus, Ho is not rejected. There is not sufficient evidence to indicate that c? exceeds 
.0002 at the 596 level of significance. E 


EXAMPLE 10.17 


Solution 


Determine the p-value associated with the statistical test of Example 10.16. 


The p-value is the probability that a x? random variable with 9 df is larger than 
the observed value of 13.5. The area corresponding to this probability is shaded in 
Figure 10.11. By examining the row corresponding to 9 df in Table 6, Appendix 3, 


FIGURE 10.11 
Illustration of the 
p-value for Example 
10.17 (x? density 
with 9 df) 
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p-value 


0 13.5 [ 1 


14.6837 


we find that Xi = 14.6837. As Figure 10.11 indicates, the shaded area exceeds .1, and 
thus the p-value is more than .1. That is, for any value of a < .1, the null hypothesis 
cannot be rejected. This agrees with the conclusion of Example 10.16. 

The exact p-value is easily obtained using the applet Chi-Square Probability and 
Quantiles. As indicated in Figure 10.11, we require P(x? > 13.5). When x? has 
9 df, as in the present situation, the applet yields P(x? > 13.5) = .14126. BH 


EXAMPLE 10.18 


Solution 


Anexperimenter was convinced that the variability in his measuring equipment results 
in a standard deviation of 2. Sixteen measurements yielded s? = 6.1. Do the data 
disagree with his claim? Determine the p-value for the test. What would you conclude 
if you chose a = .05? 


We require a test of Ho: o? = 4 versus H, :0° + 4, a two-tailed test. The value 
of the test statistic is x? = 15(6.1)/4 = 22.875. Referring to Table 6, Appendix 3, 
we see that, for 15 df, 25 = 24.9958 and x2, = 22.3072. Thus, the portion of the 
p-value that falls in the upper tail is between .05 and .10. Because we need to account 
for a corresponding equal area in the lower tail (this area is also between .05 and 
.10), it follows that .1 < p-value < .2. Using the applet Chi-Square Probability and 
Quantiles to compute the exact p-value, we obtain P( x? > 22.8750) = .0868, and 
that p-value — 2(.0868) — .1736. Whether we use the bounds obtained from Table 6 
or the exact p-value obtained from the applet, it is clear that the chosen value of 
a = .05 is smaller than the p-value; therefore, we cannot reject the experimenters 
claim at the w = .05 level. О 


Sometimes we wish to compare the variances of two normal distributions, particu- 
larly by testing to determine whether they are equal. These problems are encountered 
in comparing the precision of two measuring instruments, the variation in quality 
characteristics of a manufactured product, or the variation in scores for two test- 
ing procedures. For example, suppose that Y11, Yi2,..., Yin, and Yo, Yoo, ..., Yon, 
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FIGURE 10.12 
Rejection region 
RR for testing 

2 2 
Но:оу = o, versus 


2 2 
H,:0; > б» 
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are independent random samples from normal distributions with unknown means 
and that V(Yjj) = o? and V(Y;) = оў, where оў апа оў are unknown. Sup- 
pose that we want to test the null hypothesis Ho: oi = Ge against the alternative 
H; sur > 02. 

Because the sample variances S A and 52 estimate the respective population vari- 
ances, we reject Ho in favor of Ha if 5 is much larger than e. That is, we use a 


rejection region RR of the form 


where k is chosen so that the probability of a type I error is о. The appropriate value of k 
depends on the probability distribution of the statistic S | / б Notice that (nı—1) S : / oi 
and (n2 — 1)82 / 02 are independent x? random variables. From Definition 7.3, it 
follows that 


| (m - 1S} f/(m—0182 5202 


Е оў (n — 1) o? (m = 1) E tere 


has an F distribution with (n; — 1) numerator degrees of freedom and (n5 — 1) 
denominator degrees of freedom. Under the null hypothesis that oi = оў, it follows 
that F = 82 / 52 and the rejection region RR given earlier is equivalent to RR — 
{Е > k} = {Е > Е}, where К = Fy is the value of the F distribution with 
vı = (nı — 1) and v = (n5 — 1) such that P(F > F,) = o. Values of Fy are given 
in Table 7, Appendix 3. This rejection region is shown in Figure 10.12. 


EXAMPLE 10.19 


Solution 
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Suppose that we wish to compare the variation in diameters of parts produced by 
the company in Example 10.16 with the variation in diameters of parts produced by 
a competitor. Recall that the sample variance for our company, based on n = 10 
diameters, was 52 = .0003. In contrast, the sample variance of ће diameter measure- 
ments for 20 of the competitor’s parts was 82 — .0001. Do the data provide sufficient 
information to indicate a smaller variation in diameters for the competitor? Test with 
а = .05. 


We are testing Ho: o? = 0? against the alternative H, Ta - оў. The test statistic, 
Е = (S; / 82), is based on v; = 9 numerator and v; = 19 denominator degrees of 
freedom, and we reject Hy for values of F larger than Fos = 2.42. (See Table 7, 
Appendix 3.) Because the observed value of the test statistic is 


s? _ ‚0003 —3 

s2 .0001 7” 
we see that F > Fos; therefore, at the a = .05 level, we reject Ho :07 = оў in 
favor of H, : o? > оў and conclude that the competing company produces parts with 
smaller variation in their diameters. i 


EXAMPLE 10.20 


Solution 


Give bounds for the p-value associated with the data of Example 10.19. Use the applet 
F-Ratio Probabilities and Quantiles to determine the exact p-value. 


The calculated F-value for this upper-tail test 15 F = 3. Because this value is based 
on v, — 9 and v; — 19 numerator and denominator degrees of freedom, respectively, 
Table 7, Appendix 3, can be used to determine that F9» = 2.88 whereas Ғо = 3.52. 
Thus, the observed value, F = 3, would lead to rejection of the null hypothesis for 
a = .025 but not fora = .01. Hence, .01 < p-value < .025. 

We require p-value = P(F > 3) when F has an F distribution with v; = 9 
numerator degrees of freedom and v; = 19 denominator degrees of freedom. Direct 
use of the applet yields that P(F > 3) = .02096, a value clearly between .01 and 
.025, as indicated by the bounds for the p-value obtained from Table 7. L1 


Suppose that, for Example 10.19, our research hypothesis was H, : oi « оў. How 
would we proceed? We are at liberty to identify either population as population 1. 
Therefore, if we simply interchange the arbitrary labels of 1 and 2 on the two popula- 
tions (and the corresponding identifiers on sample sizes, sample variances, etc.), our 
alternative hypothesis becomes H4 : o? > de, and we can proceed as before. That is, 
if the research hypothesis is that the variance of one population is larger than the vari- 
ance of another population, we identify the population with the Aypothesized larger 
variance as population 1 and proceed as indicated in the solution to Example 10.19. 
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Test of the Hypothesis с? = c; 


Assumptions: Independent samples from normal populations. 
Ho: o? E 02. 
H: o; E p p 
Test statistic: F = = 

Rejection region: F 2 Fx, where Fy is chosen so that P(F > Fy) = о 
when F has v; = n, — 1 numerator degrees of freedom and v; = nz — 1 


denominator degrees of freedom. (See Table 7, Appendix 3.) 


If we wish to test Ho: оў = о; versus H, jor A 0 with type I error probability 


a, we can employ F = 5? / S as a test statistic and reject Ho in favor of H, if the 
calculated F-value is in either the upper or the lower o/2 tail of the F distribution. 
The upper-tail critical values can be determined directly from Table 7, Appendix 3; 
but how do we determine the lower-tail critical values? 

Notice that F = S / 52 and F^! = 55 / Sr both have F distributions, but the 
numerator and denominator degrees of freedom are interchanged (the process of 
inversion switches the roles of numerator and denominator). Let Еу denote a random 
variable with an F distribution with a and b numerator and denominator degrees of 
freedom, respectively, and let Ру, be such that 


Р(Е; - Еул) = 0/2. 


Then 


1 


P[(F)) ' < (Ffu) |= 0/2 


and, therefore, 
P[F? < (F2,5) | = 4/2. 


That is, the value that cuts off a lower-tail area of œ/2 for an F? distribution can be 
found by inverting Ао. Thus, if we use F = S7 / 52 as а test statistic for testing 


oS, NS 2 : ‘acti TE 
Но:оү = o5 versus H; : of % 05, the appropriate rejection region is 


RRP Ses as ot F< (Fla "a 


An equivalent test (see Exercise 10.81) is obtained as follows. Let nz and п denote 
the sample sizes associated with the larger and smaller sample variances, respectively. 
Place the larger sample variance in the numerator and the smaller sample variance in 
the denominator of ће F statistic, and reject Ho : o? = в; in favor of H, о A оў 
if F > Fap, where Е, is determined for v; = nz — 1 and v; = ng — 1 numerator 
and denominator degrees of freedom, respectively. 


EXAMPLE 10.21 


An experiment to explore the pain thresholds to electrical shocks for males and females 
resulted in the data summary given in Table 10.4. Do the data provide sufficient 
evidence to indicate a significant difference in the variability of pain thresholds for 
men and women? Use a = .10. What can be said about the p-value? 


Solution 
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Table 10.4 Data for Example 10.21 


Males Females 
n 14 10 
y 16.2 14.9 
s? 12.7 26.4 


Let us assume that the pain thresholds for men and women are approximately normally 
distributed. We desire to test Ho :0?, = 07 versus Hy :07, # 07, where oj, and o2 
are the variances of pain thresholds for men and women, respectively. The larger 
S? is 26.4 (the S? for women), and the sample size associated with the larger S? is 
п, = 10. The smaller S? is 12.7 (the S? for men), and ns = 14 (the number of men 


in the sample). Therefore, we compute 


F zi 2.079 

Лл Т? 
and we compare this value to „ул = Fos with v; = 10—1 = 9 and v = 14—1 = 13 
numerator and denominator degrees of freedom, respectively. Because Fos = 2.71 
and because 2.079 is not larger than the critical value (2.71), insufficient evidence ex- 
ists to support a claim that the variability of pain thresholds differs for men and women. 

The p-value associated with the observed value of F for this two-tailed test can 
be bounded as follows. Referring to Table 7, Appendix 3, with vj, = 9, v; = 13 
numerator and denominator degrees of freedom, respectively, we find F9 = 2.16. 
Thus, p-value > 2(.10) = .20. Unless we were willing to work with a very large 
value of о (some value greater than .2), these results would not allow us to conclude 
that the variances of pain thresholds differ for men and women. 

The exact p-value is easily obtained using the applet F-Ratio Probabilities and 
Quantiles. With 9 numerator and 13 denominator degrees of freedom, P(F > 
2.079) = .1005 and p-value = 2(.1005) = .2010, a value larger that .20, as deter- 
mined through the use of Table 7. L1 


10.78 


Although we used the notation F in Example 10.21 to denote the ratio with the 
larger S? in the numerator and the smaller 52 in the denominator, this ratio does not 
have an F distribution (notice that the ratio defined in this way must be greater than 
or equal to 1). Nevertheless, the tables of the F distribution can be used to determine 
the rejection region for an a-level test (see Exercise 10.81). 

Both the x? tests and the F tests presented in this section are very sensitive to 
departures from the assumption of normality of the underlying population(s). Thus, 
unlike the f tests of Section 10.8, these tests аге not robust if the normality assumption 
is violated. 


Exercises 


A manufacturer of hard safety hats for construction workers is concerned about the mean and 
the variation of the forces its helmets transmit to wearers when subjected to a standard external 
force. The manufacturer desires the mean force transmitted by helmets to be 800 pounds 
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(or less), well under the legal 1000-pound limit, and desires o to be less than 40. Tests were 
run on a random sample of n — 40 helmets, and the sample mean and variance were found to 
be equal to 825 pounds and 2350 pounds, respectively. 


a Ifj = 800ando = 40, is it likely that any helmet subjected to the standard external force 
will transmit a force to a wearer in excess of 1000 pounds? Explain. 


b Do the data provide sufficient evidence to indicate that when subjected to the standard 
external force, the helmets transmit a mean force exceeding 800 pounds? 


с Do the data provide sufficient evidence to indicate that с exceeds 40? 


The manufacturer of a machine to package soap powder claimed that her machine could load 
cartons at a given weight with a range of no more than .4 ounce. The mean and variance 
of a sample of eight 3-pound boxes were found to equal 3.1 and .018, respectively. Test the 
hypothesis that the variance of the population of weight measurements is o? — .01 against the 
alternative that o? > .01. 


a Usean g = .05 level of significance. What assumptions are required for this test? 
What can be said about the attained significance level using a table in the appendix? 
Applet Exercise What can be said about the attained significance level using the appro- 
priate applet? 


Under what assumptions may the F distribution be used in making inferences about the ratio 
of population variances? 


: i : { 2 Е 

From two normal populations with respective variances o? and o2, we observe independent 

sample variances $2 and S2, with corresponding degrees of freedom v, = nj—1andv) = пз 1. 
. 2 2 

We wish to test Ho : o? = 02 versus H,:07 4 02. 


a Show that the rejection region given by 


-1 
[F > Faa or F< а) |, 
where F = 52/52, is the same as the rejection region given by 
[S/S Foe e Se Eus 


b Let 57 denote the larger of 52 and 52 and let 52 denote the smaller of 5? and 52. Let vz 
and vs denote the degrees of freedom associated with 52 and 52, respectively. Use part (a) 
to show that, under Но, 


Р(52/52 > Fs) m o. 


Notice that this gives an equivalent method for testing the equality of two variances. 


Exercises 8.83 and 10.73 presented some data collected in a 1993 study by Susan Beckham and 
her colleagues. In this study, measurements of anterior compartment pressure (in millimeters 
of mercury) were taken for ten healthy runners and ten healthy cyclists. The researchers also 
obtained pressure measurements for the runners and cyclists at maximal O5 consumption. The 
data summary is given in the accompanying table. 
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Runners Cyclists 
Condition Mean 5 Меап 5 
Rest 145 3.92 11.1 3.98 
80% maximal О» 122 3.49 11.5 4.95 


consumption 
Maximal О consumption 19.1 16.9 122 4.67 


а Is there sufficient evidence to support a claim that the variability of compartment pressure 
differs for runners and cyclists who are resting? Use о = .05. 


b i Whatcan be said about the attained significance level using a table in the appendix? 
ii Applet Exercise What can be said about the attained significance level using the 
appropriate applet? 
с Is there sufficient evidence to support a claim that the variability in compartment pressure 
between runners and cyclists differs at maximal Оз consumption? Use a = .05. 


d i Whatcan be said about the attained significance level using a table in the appendix? 


ii Applet Exercise What can be said about the attained significance level using the 
appropriate applet? 


The manager of a dairy is in the market for a new bottle-filling machine and is considering 
machines manufactured by companies A and B. If ruggedness, cost, and convenience are 
comparable in the two machines, the deciding factor will be the variability of fills (the machine 
producing fills with the smaller variance being preferable). Let o? and o2 be the fill variances 
for machines produced by companies A and B, respectively. Now consider various tests of the 
null hypothesis Ho : o? = o2. Obtaining samples of fills from the two machines and using the 
test statistic 57/52, we could set up as the rejection region an upper-tail area, a lower-tail area, 
or a two-tailed area of the F distribution, depending on the interests to be served. Identify the 
type of rejection region that would be most favored by the following persons, and explain why. 


a The manager of the dairy 
b Asalesperson for company A 


C Asalesperson for company B 


An experiment published in The American Biology Teacher studied the efficacy of using 95% 
ethanol and 20% bleach as disinfectants for removing bacterial and fungal contamination when 
culturing plant tissues. The experiment was repeated 15 times with each disinfectant, using 
eggplant as the plant tissue cultured.” Five cuttings per plant were placed on a petri dish, 
disinfected using each agent, and stored at 25°С for 4 weeks. The observations reported were 
the number of uncontaminated eggplant cuttings after the 4 weeks of storage. Relevant data is 
givenin the following table. Are you willing to assume that the underlying population variances 
are equal? 


Disinfectant 95% Ethanol 20% Bleach 


Mean 3:73 4.80 
Variance 2.78095 0.17143 
n 15 15 


17. Source: Michael Brehm, J. Buguliskis, D. Hawkins, E. Lee, D. Sabapathi, and R. Smith, “Determin- 
ing Differences in Efficacy of Two Disinfectants Using f tests," The American Biology Teacher 58(2), 
(1996): 111. 
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a What can be said about the attained significance level using the F table in the appendix? 

b Applet Exercise What can be said about the attained significance level using the applet 
F-Ratio Probabilities and Quantiles? 

c What would you conclude, with a = .02? 


Applet Exercise A precision instrument is guaranteed to be accurate to within 2 units. A 
sample of four instrument readings on the same object yielded the measurements 353, 351, 
351, and 355. Give the attained significance level for testing the null hypothesis с = .7 versus 
the alternative hypothesis с > .7. 


Aptitude tests should produce scores with a large amount of variation so that an administrator 
can distinguish between persons with low aptitude and persons with high aptitude. The standard 
test used by a certain industry has been producing scores with a standard deviation of 10 points. 
A new test is given to 20 prospective employees and produces a sample standard deviation 
of 12 points. Are scores from the new test significantly more variable than scores from the 
standard? Use о = .01. 


Refer to Exercise 10.70. Is there sufficient evidence, at the 5% significance level, to support 
concluding that the variance in measurements of DDT levels is greater for juveniles than it is 
for nestlings? 


Power of Tests and the 
Neyman-Pearson Lemma 


In the remaining sections of this chapter, we move from practical examples of statis- 
tical tests to a theoretical discussion of their properties. We have suggested specific 
tests for a number of practical hypothesis testing situations, but you may wonder why 
we chose those particular tests. How did we decide on the test statistics that were 
presented, and how did we know that we had selected the best rejection regions? 

The goodness of a test is measured by с and £, the probabilities of type I and type II 
errors, respectively. Typically, the value of o is chosen in advance and determines the 
location of the rejection region. А related but more useful concept for evaluating the 
performance of a test is called the power of the test. Basically, the power of a test is 
the probability that the test will lead to rejection of the null hypothesis. 


Suppose that W is the test statistic and RR is the rejection region for a test of 
a hypothesis involving the value of a parameter 0. Then the power of the test, 
denoted by power(0), is the probability that the test will lead to rejection of Ho 
when the actual parameter value is 0. That is, 


power(0) — P(W in RR when the parameter value is 0). 


Suppose that we want to test the null hypothesis Но:0 = 69 and that 6, is a 
particular value for Ө chosen from H,. The power of the test at Ө = 00, power(0o), is 
equal to the probability of rejecting Но when Не is true. That is, ромег(бо) = о, the 
probability of a type I error. For any value of 0 from H,, the power of a test measures 


FIGURE 10.13 
A typical power 
curve for the test of 
Ho :0 = o against 
the alternative 

Н, :0 Æ 9% 


FIGURE 10.14 
Ideal power curve for 
the test of Ho :0 = 00 

versus Н, :0 4 00 
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Power 


00 


the test’s ability to detect that the null hypothesis is false. That is, for 0 = 04, 
power(0,) = P(rejecting Ho when Ө = 0,). 
If we express the probability 6 of a type II error when 0 = 0, as (04), then 
B(@,) = P (accepting Но when 0 = 0,). 


It follows that the power of the test at 6, and the probability of a type II error are 
related as follows. 


Relationship Between Power and 3 
If 0, is a value of 0 in the alternative hypothesis H4, then 


power(6,) = 1 — B(6q). 


A typical power curve, a graph of power(@), is shown in Figure 10.13. 

Ideally, a test would detect a departure from Hp :6 = 00 with certainty; that is, 
power(0,) would be 1 for all 6, in H, (see Figure 10.14). Because, for a fixed sample 
size, а and В both cannot be made arbitrarily small, this is clearly not possible. 
Therefore, for a fixed sample size n, we adopt the procedure of selecting a (small) 
value for o and finding a rejection region RR to minimize B(0,) at each 0, in Ha. 
Equivalently, we choose RR to maximize power(@) for 0 in H,. From among all tests 
with a significance level of o, we seek the test whose power function comes closest 
to the ideal power function (Figure 10.14) if such a test exists. How do we find such 
a testing procedure? 

Before we proceed, we must define simple and composite hypotheses. Suppose 
that Yi, Yo,..., Y, constitute a random sample from an exponential distribution with 


Power (0) 


1 
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parameter А; that is, f(y) = (1/А)е7>/^, у > 0. Then the hypothesis H:A = 2 
uniquely specifies the distribution from which the sample is taken as having density 
function f(y) = (1/2)e?", у > 0. The hypothesis H :X = 2 is therefore an 
example of a simple hypothesis. In contrast, the hypothesis H* : A > 2 is a composite 
hypothesis because under H* the density function f(y) is not uniquely determined. 
The form of the density is exponential, but the parameter X could be 3 or 15 or any 
value greater than 2. 


If a random sample is taken from a distribution with parameter 0, a hypothesis 
is said to be a simple hypothesis if that hypothesis uniquely specifies the distri- 
bution of the population from which the sample is taken. Any hypothesis that 
is not a simple hypothesis is called a composite hypothesis. 


If Yi, Yo,..., Y, represent a random sample from a normal distribution with known 
variance o? = 1, then Н: u = 5 is a simple hypothesis because, if Н is true, the 
density function is uniquely specified to be a normal density function with u = 5 and 
c? = 1. If, on the other hand, o? is not known, the hypothesis Н: u = 5 determines 
the mean of the normal distribution but does not determine the value of the variance. 
Therefore, if o? is not known, H : и = 5 is a composite hypothesis. 

Suppose that we would like to test a simple null hypothesis Ho : 6 = 6o versus a 
simple alternative hypothesis H4 :0 = 0a. Because we are concerned only with two 
particular values of Ө (00 and 04), we would like to choose a rejection region RR so 
that о = power(609) is a fixed value and power(0,) is as large as possible. That is, we 
seek a most powerful a level test. The following theorem provides the methodology 
for deriving the most powerful test for testing simple Ho versus simple Н. [Note: As 
in Definition 9.4, we use the notation L(0) = L(yi, y2, ..., Yn | Ө) to indicate that 
the likelihood function depends on y1, yo,..., y, and on 0.] 


The Neyman-Pearson Lemma Suppose that we wish to test the simple null 
hypothesis Но:0 = 6o versus the simple alternative hypothesis Ha :0 = 64, 
based on a random sample Y;, Y2, ..., Y, from a distribution with parameter 0. 
Let L (0) denote the likelihood of the sample when the value of the parameter 
is 0. Then, for a given о, the test that maximizes the power at 0 has a rejection 
region, RR, determined by 


L(09) 
< 
L(04) 


The value of k is chosen so that the test has the desired value for o. Such a test 
is a most powerful a-level test for Ho versus H4. 


The proof of Theorem 10.1 is not given here, but it can be found in some of the 
texts listed in the references at the end of the chapter. We illustrate the application of 
the theorem with the following example. 


EXAMPLE 10.22 


Solution 
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Suppose that Y represents a single observation from a population with probability 
density function given by 


8y9-!, О<у<1, 


(y|@) = 

fol 0, elsewhere. 

Find the most powerful test with significance level о = .05 to test Ho :0 = 2 versus 
НӨ = I. 


Because both of the hypotheses are simple, Theorem 10.1 can be applied to derive 
the required test. In this case, 


L(6) _ f(yl@) 2y _ 
L(6, Об) Dy 7 2y, forü < y <1, 


and the form of the rejection region for the most powerful test is 


2y < К. 


Equivalently, the rejection region RR is {у < k/2}. Or because k/2 = k*, a constant, 
the rejection region is RR: {у < k*}. 
Because o = .05 is specified, the value of k* is determined by 


p 
.05 = P(Y in RR when 0 = 2) = P(Y < k* when 0 = 2) = [| 2ydy= uy. 
0 


Therefore, (k*)? — .05, and the rejection region of the most powerful test is 
RR: (y < 4.05 = .2236}. 


Among all tests for Ho versus H, based on a sample size of 1 and with o fixed at 
.05, this test has the largest possible value for power(6,) = power(1). Equivalently, 
among all tests with o — .05 this test has the smallest type II error probability when 
P (04) is evaluated at 0, = 1. What is the actual value for power(0) when 0 = 1? 


power(1) = P(Y in RR when 0 = 1) = P(Y « .2236 when 0 = 1) 


.2236 
En (1) dy = .2236. 
0 


Even though the rejection region {у < .2236} gives the maximum value for power(1) 
among all tests with a = .05, we see that B(1) = 1 — .2236 = .7764 is still very 
large. L| 


Notice that the forms of the test statistic and of the rejection region depend on both 
Ho and H,. If the alternative is changed to H; : 0 = 4, the most powerful test is based 
on Y?, and we reject Ho in favor of H, if Y ? > K', for some constant k’. Also notice 
that the Neyman-Pearson lemma gives the form of the rejection region; the actual 
rejection region depends on the specified value for o. 

For discrete distributions, it is not always possible to find a test whose significance 
level is exactly equal to some predetermined value of o. In such cases, we specify 
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the test to be the one for which the probability of a type I error is closest to the 
predetermined value of a without exceeding it. 

Suppose that we sample from a population whose distribution is completely spec- 
ified except for the value of a single parameter 0. If we desire to test Hp: 0 = 00 
(simple) versus H,:0 > 69 (composite), no general theorem comparable to Theo- 
rem 10.1 is applicable if either hypothesis is composite. However, Theorem 10.1 can 
be applied to obtain a most powerful test for Ho :0 = 09 versus H,:0 = 0, for any 
single value 0,, where 0, > 6. In many situations, the actual rejection region for 
the most powerful test depends only on the value of бу (and does not depend on the 
particular choice of 04). When a test obtained by Theorem 10.1 actually maximizes 
the power for every value of 0 greater than 6o, itis said to be a uniformly most powerful 
test for Ho:0 = 6o versus H4 :0 > 6o. Analogous remarks apply to the derivation 
of tests for H9 :0 = 00 versus H,:0 < 09. We illustrate these ideas in the following 
example. 


EXAMPLE 10.23 


Solution 


Suppose that Y;, Y2,..., Y, constitute a random sample from a normal distribution 
with unknown mean и and known variance o?. We wish to test Но: ш = [Wo against 
На: и > цо for a specified constant uo. Find the uniformly most powerful test with 
significance level o. 


We begin by looking for the most powerful -level test of Ho : u = цо versus H7 : ш = 
Ha for one fixed value of jz, that is larger than ио. Because 


(1! ==) 
ror = (=) e| FE 1 


оо < у < оо, 


we have 


n 


гш) = FW) FO lio fonla = (=) exp ухо | 
o2 292 


i=l 
[Recall that exp(w ) is simply e" in another form. ] Because both Но and H7 are simple 
hypotheses, Theorem 10.1, implies that the most powerful test of Ho : ш = uo versus 
Hř : = [a is given by 
L (0) 
< 
L(a) 


which in this case is equivalent to 
1 ү п Ол— а 
—— | ех _ 
(су) P| Diet ua 
z 22 k 
1 ехр y? (yi = Ha) 
oA 21 ZEE 


This inequality can be rearranged as follows: 


1 n n 
сЕ) шо)? — } 0 ZI 
i=1 i=l 
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Taking natural logarithms and simplifying, we have 
1 n n 7 
= Р uo)? о Zi < In(k) 


950% — #0)? — =) > —207 In(k) 
ial i-l 


2 x – 2пуџо t nuo — 2 у; + WY Ha — ni; > —207 (4) 

i=l i=l 

—20° In(k) — nu + nu? 
2n 


Y(Ha — Mo) > 
Or, Since Ma > Ho, 
—20° In(k) — пиё + nu? 
2n(Ha — Ho) | 


Because o°, n, шо, and Ua are all known constants, the quantity on the right-hand 


side of this inequality is a constant—call it k’. Therefore, the most powerful test of 
Ho: = шо versus Н? : u = Ha has the rejection region given by 
RR = (y > А). 
The precise value of К is determined by fixing о and noting that 
a = P(Y іп RR when и = uo) 
= P(Y > К when и = uo) 


= Y — цо " К — uo 
(o олп olyn 
= P (Z > Jn(k — uolo). 


Because, under Ho, Z has a standard normal distribution, P(Z > zy) = о and the 
required value for k’ must satisfy 
Мп(К' — uo)/o = Za, or equivalently, k’ = шо + z;o/A/n. 

Thus, the o-level test that has the largest possible value for power(0,) is based on 
the statistic Y and has rejection region RR = [y > шо + Zyo/./n}. We now observe 
that neither the test statistic nor the rejection region for this a-level test depends on 
the particular value assigned to ua. That is, for any value of ua greater than шо, we 
obtain exactly the same rejection region. Thus, the o-level test with the rejection region 
previously given has the largest possible value for power(j4,) for every pa > о. It is 
the uniformly most powerful test for Ho : ш = uo versus Ha : u > uo. This is exactly 
the test that we considered in Section 10.3. 0 


Again consider the situation where the random sample is taken from a distribution 
that is completely specified except for the value of a single parameter Ө. If we wish to 
derive a test for Hp : 0 < 00 versus H4 :0 > 0o (so that both Ho and H, are composite 
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hypotheses), how do we proceed? Suppose that we use the method illustrated in 
Example 10.23 to find a uniformly most powerful test for Hj :0 = 00 versus H4 :0 > 
Oo. If 0; is a fixed value of 0 that is less than 6o and we use the same test for Hj :0 = Ө, 
versus H,, typically, o will decrease and power(0,) will remain unchanged for all 6, 
in Ha. In other words, if we have a good test for discriminating between Hj and Ha, 
the same test will be even better for discriminating between Ну and H4. For tests with 
composite null hypotheses of the form Ho:0 < 4 (or Ho:0 > 09), we define the 
significance level o to be the probability of a type I error when 0 = 00; that is, о = 
power(6). Generally, this value for œ is the maximum value of the power function 
for 0 < @ (or 0 > 09). Using this methodology, we can show that the test derived in 
Example 10.23 for testing Ho :0 = 09 versus H,:0 > 6o is also the uniformly most 
powerful o--level test for testing Ho:0 < 09 versus Ha :0 > 09. 

In Example 10.23, we derived the uniformly most powerful test for Ho: u = uo 
versus H,: ш > шо and found it to have rejection region (y > шо + zoo / /n). If 
we wished to test Но: = шо versus На: ш < цо, analogous calculations would 
lead us to (y < Ho — zo60/4/n) as the rejection region for the test that is uniformly 
most powerful for all pa < ио. Therefore, if we wish to test Но: и = о versus 
Ha: р X цо, no single rejection region yields the most powerful test for all values 
of ра ~ Ho. Although there are some special exceptions, in most instances there do 
not exist uniformly most powerful two-tailed tests. Thus, there are many null and 
alternative hypotheses for which uniformly most powerful tests do not exist. 

The Neyman-Pearson Іетта 15 useless if we wish to test a hypothesis about a single 
parameter 0 when the sampled distribution contains other unspecified parameters. For 
example, we might want to test Ho : u = uo when the sample is taken from a normal 
distribution with unknown variance o°. In this case, Ho: и = uo does not uniquely 
determine the form of the distribution (since o? could be any nonnegative number), 
anditis therefore not a simple hypothesis. The next section presents a very general and 
widely used method for developing tests of hypotheses. The method is particularly 
useful when unspecified parameters (called nuisance parameters) are present. 


Exercises 

Refer to Exercise 10.2. Find the power of the test for each alternative in (a)- (d). 
а р= 4. 

b p=.5. 

с p=.6. 

d р= 7. 

е Sketch a graph of the power function. 

Refer to Exercise 10.5. Find the power of test 1 for each alternative in (a)-(e). 
a Q=]; 

b 0= 4. 

с OS ui: 

d 6 =1. 

е Sketch a graph of the power function. 
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Refer to Exercise 10.5. 


a Find the power of test 2 for each of the following alternatives: Ө = .1, 0 = .4, 0 = .7, and 
g= 1, 
Sketch a graph of the power function. 


Compare the power function in part (b) with the power function that you found in Exercise 
10.89 (this is the power function for test 1, Exercise 10.5). What can you conclude about 
the power of test 2 compared to the power of test 1 for all 0 > 0? 


Let Y;, Y2,..., Ро be a random sample of size n = 20 from a normal distribution with unknown 
mean u and known variance o? = 5. We wish to test Ho : ш = 7 versus Ha: ш > 7. 
a Find the uniformly most powerful test with significance level .05. 


b For the test in part (a), find the power at each of the following alternative values for 
Ш: Ша = 7.5, 8.0, 8.5, and 9.0. 


€ Sketch a graph of the power function. 


Consider the situation described in Exercise 10.91. What is the smallest sample size such that 
an o = .05-level test has power at least .80 when u = 8? 


For a normal distribution with mean џи and variance o? = 25, an experimenter wishes to test 
Ho: = 10 versus H, : и = 5. Find the sample size n for which the most powerful test will 
havea = В = .025. 


Suppose that Y;, Y2,..., Y, constitute a random sample from a normal distribution with known 
mean ш and unknown variance o°. Find the most powerful o-level test of Hy : 0? = oj versus 
H, :0° = oj, where o? > oj. Show that this test is equivalent to a x? test. Is the test uniformly 
most powerful for H, : 6? > o2? 


Suppose that we have a random sample of four observations from the density function 
1 ) 2 ,—y/8 
— ļ] ye, у> 0, 
f(y|@) = (5 : 
0, elsewhere. 


a Find the rejection region for the most powerful test of Hj :@ = 00 against H,:0 = 0,, 
assuming that 0, > 0o. [Hint: Make use of the x? distribution.] 


b Isthe test given in part (a) uniformly most powerful for the alternative Ө > 69? 


Suppose Y is a random sample of size 1 from a population with density function 


0y*, 0О<у<1, 
fole) = 


0, elsewhere, 


where 0 > 0. 


a Sketch the power function of the test with rejection region: Y > .5. 


b Based on the single observation У, find a uniformly most powerful test of size o for testing 
Ho:0 = 1 versus H,:0 > 1. 


Let Yi, Yo,..., Y, be independent and identically distributed random variables with discrete 
probability function given by 


y 
1 2 3 


pIe) Ө? 20-0) (1—0)? 


where 0 < 6 < 1. Let N; denote the number of observations equal to i fori = 1, 2, 3. 
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a Derive the likelihood function L(0) as a function of №, №, and №. 

b Find the most powerful test for testing Но:0 = 0 versus H,:0 = 0,, where 0, > 6s. 
Show that your test specifies that Ho be rejected for certain values of 2N; + N2. 

с How do you determine the value of k so that the test has nominal level œ? You need not do 
the actual computation. A clear description of how to determine К is adequate. 


d Is the test derived in parts (a)-(c) uniformly most powerful for testing Ho :@ = 00 versus 
H, :0 > 09? Why or why not? 


Let Vissi Y, be a random sample from the probability density function given by 
(5) my"-lg-Y"f6. у> 0, 
f(10) = } \@ 


0, elsewhere, 


with m denoting a known constant. 


a Find the uniformly most powerful test for testing Ho :0 = 00 against H, :0 > 00. 


b If the test in part (a) is to have 00 = 100, а = .05, and В = .05 when 0, = 400, find the 
appropriate sample size and critical region. 


Let Yı, Y2,..., Y, denote a random sample from a population having a Poisson distribution 
with mean А. 


a Find the form of the rejection region for а most powerful test of Ho: X = Ao against 
Hy: = Аа, Where àa > Ao. 

b Recall that ? ; , Y; has a Poisson distribution with mean n4. Indicate how this information 
can be used to find any constants associated with the rejection region derived in part (a). 

C Is the test derived in part (a) uniformly most powerful for testing Ho:A = Ao against 
Hy: > А? Why? 

d Find the form of the rejection region for a most powerful test of Но:А = Ao against 
H,:X = àa, where àa < Ao. 


Let Y;, Yo,..., Y, denote a random sample from a population having a Poisson distribution with 


mean А. Let X1, X5, ..., Xm denote an independent random sample from a population having 
a Poisson distribution with mean 4». Derive the most powerful test for testing Ho : A; = А = 2 
versus H,:À, = 1/2, № —3. 


Suppose that У, У, ..., Y, denote a random sample from a population having an exponential 
distribution with mean 0. 


a Derive the most powerful test for Ho : 0 = Oo against H, :0 = 0,, where 0, < Ө. 


b Is the test derived in part (a) uniformly most powerful for testing Ho:0 = 0 against 
H,:0 < 09? 


Let Y;, Y?,..., Y, denote a random sample from a Bernoulli-distributed population with 
parameter p. That is, 


рб 1р) = р" (1 р) 7", у=0,1. 
a Suppose that we are interested in testing Ho: p = po versus H,: p = Pa, where po < ра. 


i Show that 


L(po) _ [ne - |" (i = | 
L(pa) 14 – ро)ра l= ра) ` 
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ii Argue that L(po)/L(pa) < k if and only if У), y; > k* for some constant k*. 


ili Give the rejection region for the most powerful test of Ho versus H4. 


b Recall that 5 7 Y; has a binomial distribution with parameters п and p. Indicate how 
to determine the values of any constants contained in the rejection region derived in part 
[a(iii)]. 

C Is the test derived in part (a) uniformly most powerful for testing Ho: p = po versus 
H,:p > po? Why or why not? 


Let Y;, У, ..., Y, denote a random sample from a uniform distribution over the interval (0, 0). 


a Find the most powerful o-level test for testing Ho:0 = Ө, against H,:0 = 0,, where 
0, < 0o. 
b Isthe test in part (a) uniformly most powerful for testing Ho: 0 = 00 against H,:0 < 09? 


Refer to the random sample of Exercise 10.103. 


a Find the most powerful o-level test for testing Ho:0 = 0, against Ha :0 = 0,, where 
0, > Oo- 
Is the test in part (a) uniformly most powerful for testing Ho :0 = 0$ against H, :0 > 09? 


Is the most powerful a-level test that you found in part (a) unique? 


Likelihood Ratio Tests 


Theorem 10.1 provides a method of constructing most powerful tests for simple 
hypotheses when the distribution of the observations is known except for the value of 
a single unknown parameter. This method can sometimes be used to find uniformly 
most powerful tests for composite hypotheses that involve a single parameter. In 
many cases, the distribution of concern has more than one unknown parameter. In 
this section, we present a very general method that can be used to derive tests of 
hypotheses. The procedure works for simple or composite hypotheses and whether 
or not other parameters with unknown values are present. 

Suppose that a random sample is selected from a distribution and that the likelihood 
function L(yi, Y2, ..., Yn |01, 05, ..., Ө.) is a function of k parameters, 0;, 05, ..., Ox. 
To simplify notation, let Ө denote the vector of all К parameters—that is, Ө = 
(01, 65,..., 0,) —and write the likelihood function as L(@). It may be the case that 
we are interested in testing hypotheses only about one of the parameters, say, 0;. For 
example, if as in Example 10.24, we take a sample from a normally distributed popu- 
lation with unknown mean ш and unknown variance o^, then the likelihood function 
depends on the two parameters и and o? and Ө = (и, o°). If we are interested in 
testing hypotheses about only the mean и, then с2—а parameter not of particular 
interest to us—is called a nuisance parameter. Thus, the likelihood function may be 
a function with both unknown nuisance parameters and a parameter of interest. 

Suppose that the null hypothesis specifies that O (may be a vector) lies in a par- 
ticular set of possible values—say, (29—and that the alternative hypothesis specifies 
that © lies in another set of possible values Q,, which does not overlap Qo. For 
example, if we sample from a population with an exponential distribution with mean 
А (in this case, A is the only parameter of the distribution, and © = A), we might be 
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interested in testing Ho: à = Ao versus Н. : А Æ Ао. In this exponential example, Qo 
contains only the single value Ao and ©, = {A > 0:4 Æ Ao}. Denote the union of 
the two sets, Qo and Qa, by Q; that is, © = Qo U Qa. In the exponential example, 
Q = {ào} U {А > 0:4 Æ ào} = {A:A > 0}, the set of all possible values for A. Either 
or both of the hypotheses Ho and H, can be composite because they might contain 
multiple values of the parameter of interest or because other unknown parameters 
may be present. 

Let L (ĉo) denote the maximum (actually the supremum) of the likelihood function 
for all Ө € Qo. That is, L($29) = maxoeo, L(8). Notice that L(Q0) represents the 
best explanation for the observed data for all Ө є Qo and can be found by using 
methods similar to those used in Section 9.7. Similarly, L(Q) = maxeeg L(@) 
represents the best explanation for the observed data for all © € Q = Qo U Qa. If 
L(Qg) = L(€), then a best explanation for the observed data can be found inside Qo, 
and we should not reject the null hypothesis Ho : © € Qo. However, if L (Qo) ec (Q), 
then the best explanation for the observed data can be found inside Q4, and we should 
consider rejecting Ho in favor of H,. A likelihood ratio test is based on the ratio 
L(2)/L (82). 


A Likelihood Ratio Test 
Define л by 


wey. Dee 


й = EM e Е 
L(Q) max L(©) 


A likelihood ratio test of Но: © € Qo versus H, :© є Q, employs A as a test 
statistic, and the rejection region is determined by А < k. 


It can be shown that O < A < 1. A value of A close to zero indicates that the likeli- 
hood of the sample is much smaller under Но than it is under H4. Therefore, the data 
suggest favoring H, over Но. The actual value of k is chosen so that о achieves the 
desired value. We illustrate the mechanics of this method with the following example. 


EXAMPLE 10.24 


Solution 


Suppose that Y1, Y2, ..., Y, constitute a random sample from a normal distribution 
with unknown mean ш and unknown variance o?. We want to test Ну: ш = Ho versus 
Hy: > цо. Find the appropriate likelihood ratio test. 


In this case, © = (u, o°). Notice that Qo is the set {(шо,02):02 > 0}, Qa = 
{(ш, с?2): ш > uo, 0? > 0}, and hence that О = Qo U Qa = {(u, 07): > ир, 
o? > 0}. The constant value of the variance o? is completely unspecified. We must 
now find L (o) and L(Q). 

For the normal distribution, we have 


1 п 1 n/2 n je p) 
L(®) = L(u, 0?) = (=) (=) exp |- У) er 
і=1 
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Restricting и to Qo implies that и = uo, and we can find L(Qo) if we determine 
the value of o? that maximizes L(j, о?) subject to the constraint that u = шо. From 
Example 9.15, we see that when ш = uo the value of o? that maximizes L (uo, о?) is 


1 

ad 2 

ô == у i — шо). 
0 " L (yi 0) 


Thus, (©) is obtained by replacing u with шо and o? with ôe in L(u, o°), which 
gives 


А 1 п 1 n/2 n (у; = "m 1 n 1 n/2 " 
ны (аа Ge Ets] GG 
020) (ag) Tis ж x) a 


We now turn to finding L (Ê). Asin Example 9.15, itis easier to look atIn L (u, o°’), 


"" РА " п 1 п 5 
ш (и, 0?)] = —7 Ino? — = 027 x20 uy. 


Taking derivatives with respect to и and o?, we obtain 


A{In[L(u, о) _ 1° 
T = gO и), 


а{ш[ (и, oD _ (С үг XC ny. 
il 


де? 202 


We need to find the maximum of Г. (и, о?) over the set Q = {(и,о?):н > шо, 
c? > 0}. Notice that 


àL(u,o?)/üu < 0, ifj y, 
üL(u,o?)/üu 20, | ifu —y, 
ƏL(u, 07)/dpu > 0, if < y. 


Thus, over the set Q = {(u, 0?) : ш> Ho, o? > 0), In L(u, o?) [and also 2 (и, o7)] 
is maximized at 4 where 


„_ ру, if y > uo, 
Ho, if Y< шо. 


Just as earlier, the value of o? in Q that maximizes Г, (ш, 07), is 
2 lx 2 
6? = = Di А). 
i=l 


L(Q) is obtained by replacing и with й and o? with 62, which yields 


А 1 п 1 n/2 n (y; = AY 1 n 1 n/2 ВГ 
иа - (5) (s) рг Gm) (m) «7 
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Thus, 
и L(Qy _ (E) 
~ LÊ) g 
n/2 
УО = У)? "- 
= БЕ — шо)? |” Harm 
1, ify < po. 


Notice that A is always less than or equal to 1. Thus, “small” values of A are those 
less than some k < 1. Because 


XO - ш)? = у Io 7) + 6 - o 
i-l 


i=l 


=) 0 – ў) +207 – ш)? 
ial 


if k < 1, it follows that the rejection region, à < k, is equivalent to 


n у 
Ул 0i У) < UL = k 


NU = шо)? 
Mio —7)* zn 
УОН — у)? n — ш)? 
1 / 
nG — jo)? < К. 
|+ "© = Ho) 
У” О = у)? 
This inequality in turn is equivalent to 
n(y — шо)? А 1 Pa 
Mio -»? К 
oque 
c шо) > (и Dk" 
30; -» 
"X rer 


or, because y > шо when A < k < 1, 


nC — но) > (n= Dk", 
s 


where 


Notice that ./n(Y — )/$ is the t statistic employed in previous sections. Conse- 
quently, the likelihood ratio test is equivalent to the t test of Section 10.8. ш 


ТНЕОКЕМ 10.2 
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Situations in which the likelihood ratio test assumes a well-known form are not 
uncommon. In fact, all the tests of Sections 10.8 and 10.9 can be obtained by the 
likelihood ratio method. For most practical problems, the likelihood ratio method 
produces the best possible test, in terms of power. 

Unfortunately, the likelihood ratio method does not always produce a test statistic 
with a known probability distribution, such as the ¢ statistic of Example 10.24. If the 
sample size is large, however, we can obtain an approximation to the distribution of А. 
if some reasonable "regularity conditions" are satisfied by the underlying population 
distribution(s). These are general conditions that hold for most (but not all) of the 
distributions that we have considered. The regularity conditions mainly involve the 
existence of derivatives, with respect to the parameters, of the likelihood function. 
Another key condition is that the region over which the likelihood function is positive 
cannot depend on unknown parameter values. 


Let Y1, Yo,..., Y, have joint likelihood function L(©). Let ro denote the num- 
ber of free parameters that are specified by Ho: © є Qo and let r denote the 
number of free parameters specified by the statement © є ©. Then, for large 
n, —2 In(A) has approximately a x? distribution with rọ — r df. 


The proof of this result is beyond the scope of this text. Theorem 10.2 allows us 
to use the table of the x? distribution to find rejection regions with fixed о when п 
is large. Notice that —2 In(A) is a decreasing function of A. Because the likelihood 
ratio test specifies that we use RR: {A < k}, this rejection may be rewritten as 
RR: {—2 InQ®) > —2 In(k) = k*}. For large sample sizes, if we desire an o-level 
test, Theorem 10.2 implies that k* ~ х4. That is, a large-sample likelihood ratio test 
has rejection region given by 


—2 nA) > xj, where x, is based on rg — r df. 


The size of the sample necessary for a “good” approximation varies from application 
to application. It is important to realize that large-sample likelihood ratio tests are 
based on —2 In(A), where А is the original likelihood ratio, А = L(Q9)/ L(Q). 


EXAMPLE 10.25 


Solution 


Suppose that an engineer wishes to compare the number of complaints per week filed 
by union stewards for two different shifts at a manufacturing plant. One hundred 
independent observations on the number of complaints gave means x — 20 for shift 
] and y = 22 for shift 2. Assume that the number of complaints per week on the ith 
shift has a Poisson distribution with mean 6;, fori = 1,2. Use the likelihood ratio 
method to test Но: 0 = 0» versus Н: Ө\ # 0» witha 7 .01. 


The likelihood of the sample is now the joint probability function of all x;'s and y;'s 
and is given by 


1 Xi ; 
LO, 05) = (z) o>” in gi eon 


where k = x1!---X,!yi!---y,!, and n = 100. In this example, © = (01, 0) and 
Qo = ((01, 02) :0; = 05 = 0), where 0 is unknown. Hence, under Hp the likelihood 
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function is a function of the single parameter 0, and 


AN 
100) = (5) pw. 


Notice that, for © є Qo, L(0) is maximized when 0 is equal to its maximum likelihood 
estimate, 


А 1 п п 1 Е Е 
Ө = xs) = а, 
In this example, ©, = {(01, 62): 0; Æ 65) and © = {(01, 0):0 > 0, 6 > 0}. Using 
the general likelihood L (01, 02). a function of both 6; and Ө», we see that L (04, 05) is 
maximized when 6, = — x and 0, = = у, respectively. That is, L(61, 02) is maximized 
when both 0; and Ө» are replaced by their maximum likelihood estimates. Thus, 

Е L(Qo) Е ko! Ont +" o 720 e (Ô) 

za L(Q) = k-! (8, )"* (0)! e-16:—n6; (x) (yyny ` 
Notice that А is a complicated function of х and y. The observed value of 0 is 
(1/2) (x + y) = (1/2) 20 + 22) = 21. The observed value of A is 


2,1 (100) 20-22) 


AS 2()(100) Q0) 22 (100) Q2) 


and hence 
—2 (А) = —(2)[4200 In(21) — 20001n(20) — 22001n(22)] = 9.53. 


In this application, the number of free parameters in Q = ((01, 02) :0 > 0,05 > 0} 
is k = 2. In Qo = ((01,05) :0; = 0; = 0), ro = 1 of these free parameters is fixed. 
In the set ©, r = 0 of the parameters are fixed. Theorem 10.2 implies that —2 In(A) 
has an approximately x? distribution with rọ — r = 1 — 0 = 1 df. Small values of 
à correspond to large values of —2 In(A), so the rejection region for a test at approx- 
imately the a = .01 level contains the values of —2 In(A) that exceed х2, = 6.635, 
the value that cuts off an area of .01 in the right-hand tail of a x? density with 1 df. 

Because the observed value of —2 In(A) is larger than Xan we reject Но: 01 = 65. 
We conclude, at approximately the œ = .01 level of significance, that the mean 
numbers of complaints filed by the union stewards do differ. L| 
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Let Y;, Y,..., Y, denote a random sample from a normal distribution with mean и (unknown) 
and variance o?. For testing Ho: о? = оў against H, 10? > 00; show that the likelihood ratio 
test is equivalent to the x? test given in Section 10.9. 


A survey of voter sentiment was conducted in four midcity political wards to compare the 
fraction of voters favoring candidate A. Random samples of 200 voters were polled in each of 
the four wards, with the results as shown in the accompanying table. The numbers of voters 
favoring A in the four samples can be regarded as four independent binomial random variables. 
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Construct a likelihood ratio test of the hypothesis that the fractions of voters favoring candidate 
A are the same in all four wards. Use œ = .05. 


Ward 


Opinion 1 2 3 4 Total 


Favor A 76 53 59 48 236 
Do not favor A 124 147 141 152 564 


Total 200 200 200 200 800 


Let S? and $7 denote, respectively, the variances of independent random samples of sizes п 
and m selected from normal distributions with means ш and и» and common variance o?. If 
ш and и? are unknown, construct a likelihood ratio test of Ho : о? = с against H, : о? = 0? 


a? 
assuming that o2 > оу. 


Suppose that Х|, X5, ..., X,,, Yi, Yo, ..., Yn, and №, Wo,..., Wn, are independent random 
samples from normal distributions with respective unknown means 44i, 42, and из and vari- 
ances o7, о2, and o2. 


a Find the likelihood ratio test for Ho : o? = 02 = 02 against the alternative of at least one 


inequality. 
b Find an approximate critical region for the test in part (a) if nı, n2, and n3 are large and 
о = .05. 


Let Х|, X», ..., Xm denote a random sample from the exponential density with mean 6, and 
let Y;, Y), ..., Y, denote an independent random sample from an exponential density with 
mean 6. 


a Find the likelihood ratio criterion for testing Ho : Ө, = 0» versus H, : 0, # 65. 


b Show that the test in part (а) is equivalent to an exact F test [Hint: Transform У X; and 
> Y; to x? random variables.] 


Show that a likelihood ratio test depends on the data only through the value of a sufficient 
statistic. [Hint: Use the factorization criterion.] 


Suppose that we are interested in testing the simple null hypothesis Ho:0 = 6, versus the 
simple alternative hypothesis H, :0 = 6,. According to the Neyman-Pearson lemma, the test 
that maximizes the power at 0; has a rejection region determined by 

L(80) 

< 

L(04) 
In the context of a likelihood ratio test, if we are interested in the simple Ho and H,, as stated, 
then 20 = (00). Qa = (04). and Q = (0o, Oa}. 


a Show that the likelihood ratio A is given by 


LG) _ 1 
X max{L (4), LO} | LO) | 
max į 1, 
L(G) 
b Argue that A < k if and only if, for some constant К”, 
L(0, 
(80) «Kk. 
L(04) 


c What do the results in parts (a) and (b) imply about likelihood ratio tests when both the 
null and alternative hypotheses are simple? 
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Suppose that independent random samples of sizes n, and п» are to be selected from normal 
populations with means 4; and u2, respectively, and common variance o°. For testing Ho: ш = 
и» versus H, : ш — Шо > 0 (о? unknown), show that the likelihood ratio test reduces to the 
two-sample f test presented in Section 10.8. 


Refer to Exercise 10.112. Show that in testing of Ho: ш = и» versus Hy: ui % ио (о? 
unknown) the likelihood ratio test reduces to the two-sample t test. 


Referto Exercise 10.113. Suppose that another independent random sample of size пз is selected 
from a third normal population with mean из and variance o°. Find the likelihood ratio test 
for testing Ho: шу = мә = из versus the alternative that there is at least one inequality. Show 
that this test is equivalent to an exact F test. 


Summary 


In Chapters 8-10, we have presented the basic concepts associated with two methods 
for making inferences: estimation and tests of hypotheses. Philosophically, estimation 
(Chapters 8 and 9) focuses on this question: Whatis the numerical value of a parameter 
0? In contrast, a test of a hypothesis attempts to answer this question: Is there enough 
evidence to support the alternative hypothesis? Often, the inferential method that 
you employ for a given situation depends on how you, the experimenter, prefer to 
phrase your inference. Sometimes this decision is taken out of your hands. That 
is, the practical question clearly implies that either an estimation or a hypothesis- 
testing procedure be used. For example, acceptance or rejection of incoming supplies 
or outgoing products in a manufacturing process clearly requires a decision, or a 
statistical test. We have seen that a duality exists between these two inference-making 
procedures. A two-sided confidence interval with confidence coefficient 1 — о may 
be viewed as the set of all values of 6o that are “acceptable” null hypothesis values for 
Ө if we use a two-sided o-level test. Similarly, a two-sided a-level test for Ho : 0 = 00 
can be implemented by constructing a two-sided confidence interval (with confidence 
coefficient 1 — о) and rejecting Но if the value Ө) falls outside the confidence interval. 

Associated with both methods for making inferences are measures of their good- 
ness. Thus, the expected width of a confidence interval and the confidence coefficient 
both measure the goodness of the estimation procedure. Likewise, the goodness of a 
statistical test is measured by the probabilities о and £ of type I and type II errors. 
These measures of goodness enable us to compare one statistical test with another 
and to develop a theory for acquiring statistical tests with desirable properties. The 
ability to evaluate the goodness of an inference is one of the major contributions of 
statistics to the analysis of experimental data. Of what value is an inference if you 
have no measure of its validity? 

In this chapter, we have investigated the elements of a statistical test and discussed 
how atest works. Some useful tests are given to show how they can be used in practical 
situations, and you will see other interesting applications in the chapters that follow. 

Many ofthe testing procedures developed in this chapter were presented from an in- 
tuitive perspective. However, we have also illustrated the use of the Neyman-Pearson 
lemma in deriving most powerful procedures for testing a simple null hypothesis 
versus a simple alternative hypothesis. In addition, we have seen how the Neyman- 
Pearson method can sometimes be used to find uniformly most powerful tests for 
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composite null and alternative hypotheses if the underlying distribution is specified 
except for the value of a single parameter. The likelihood ratio procedure provides 
a general method for developing a statistical test. Likelihood ratio tests can be de- 
rived whether or not nuisance parameters are present. In general, likelihood ratio tests 
possess desirable properties. The Neyman-Pearson and likelihood ratio procedures 
both require that the distribution of the sampled population(s) must be known, except 
for the values of some parameters. Otherwise, the likelihood functions cannot be 
determined and the methods cannot be applied. 
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Supplementary Exercises 


True or False. 


a Ifthe p-value for a test is .036, the null hypothesis can be rejected at the a = .05 level of 
significance. 


b Ina formal test of hypothesis, о is the probability that the null hypothesis is incorrect. 


с If the p-value is very small for a test to compare two population means, the difference 
between the means must be large. 


Power(0*) is the probability that the null hypothesis is rejected when 0 = 0*. 
Power(0) is always computed by assuming that the null hypothesis is true. 


f If.01 < p-value < .025, the null hypothesis can always be rejected at the о = .02 level 
of significance. 


g Suppose that a test is a uniformly most powerful a-level test regarding the value of a 
parameter Ө. If 0, is a value in the alternative hypothesis, 8(0,) might be smaller for some 
other œ-level test. 

h When developing a likelihood ratio test, it is possible that Г.(о) > L(Q). 

i —2 In(A) is always positive. 


Refer to Exercise 10.6. Find power(p), for р = .2, .3, .4, .5, .6, .7, and .8 and draw a rough 
sketch of the power function. 
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Lord Rayleigh was one of the earliest scientists to study the density of nitrogen. In his studies, 
he noticed something peculiar. The nitrogen densities produced from chemical compounds 
tended to be smaller than the densities of nitrogen produced from the air. Lord Rayleigh's 
measurements!? are given in the following table. These measurements correspond to the mass 


of nitrogen filling a flask of specified volume under specified temperature and pressure. 


Compound Chemical Atmosphere 


2.30143 2.31017 
2.29890 2.30986 
2.29816 2.31010 
2.30182 2.31001 
2.29869 2.31024 
2.29940 2.31010 
2.29849 2.31028 
2.29889 2.31163 
2.30074 2.30956 
2.30054 


For the measurements from the chemical compound, у = 2.29971 and s = .001310; for 
the measurements from the atmosphere, y = 2.310217 and s = .000574. Is there sufficient 
evidence to indicate a difference in the mean mass of nitrogen per flask for chemical 
compounds and air? What can be said about the p-value associated with your test? 

Find a 95% confidence interval for the difference in mean mass of nitrogen per flask for 
chemical compounds and air. 


Based on your answer to part (b), at the о = .05 level of significance, is there sufficient 
evidence to indicate a difference in mean mass of nitrogen per flask for measurements from 
chemical compounds and air? 

Is there any conflict between your conclusions in parts (a) and (b)? Although the difference 
in these mean nitrogen masses is small, Lord Rayleigh emphasized this difference rather 
than ignoring it, and this led to the discovery of inert gases in the atmosphere. 


The effect of alcohol consumption on the body appears to be much greater at higher altitudes. 
To test this theory, a scientist randomly selected 12 subjects and divided them into two groups 
of 6 each. One group was transported to an altitude of 12,000 feet, and each member in the 
group ingested 100 cubic centimeters (cm?) of alcohol. The members of the second group were 
taken to sea level and given the same amount of alcohol. After 2 hours, the amount of alcohol 
in the blood of each subject was measured (measurements in grams/100 cm?). The data are 
given in the following table. Is there sufficient evidence to indicate that retention of alcohol is 
greater at 12,000 feet than at sea level? Test at the о = .10 level of significance. 


SeaLevel 12,000 feet 


‚07 13 
10 17 
09 15 
12 14 
09 10 
13 14 


18. Source: Proceedings, Royal Society (London) 55 (1894): 340—344. 
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Currently, 20% of potential customers buy soap of brand A. To increase sales, the company 
will conduct an extensive advertising campaign. At the end of the campaign, a sample of 400 
potential customers will be interviewed to determine whether the campaign was successful. 


a State Hp and H, in terms of p, the probability that a customer prefers soap brand A. 


b The company decides to conclude that the advertising campaign was a success if at least 
92 of the 400 customers interviewed prefer brand A. Find a. (Use the normal approximation 
to the binomial distribution to evaluate the desired probability.) 


In the past, a chemical plant has produced an average of 1100 pounds of chemical per day. The 
records for the past year, based on 260 operating days, show the following: 


y = 1060 pounds/day, s — 340 pounds/day. 


We wish to test whether the average daily production has dropped significantly over the past 
year. 


a Give the appropriate null and alternative hypotheses. 


b If Z is used as a test statistic, determine the rejection region corresponding to a level of 
significance of a = .05. 


с Do the data provide sufficient evidence to indicate a drop in average daily production? 


The braking ability of two types of automobiles was compared. Random samples of 64 auto- 
mobiles were tested for each type. The recorded measurement was the distance required to stop 
when the brakes were applied at 40 miles per hour. The computed sample means and variances 
were as follows: 


Do the data provide sufficient evidence to indicate a difference in the mean stopping distances 
of the two types of automobiles? Give the attained significance level. 


The stability of measurements of the characteristics of a manufactured product is important 
in maintaining product quality. In fact, it is sometimes better to obtain small variation in the 
measured value of some important characteristic of a product and have the process mean 
slightly off target than to get wide variation with a mean value that perfectly fits requirements. 
The latter situation may produce a higher percentage of defective product than the former. A 
manufacturer of light bulbs suspected that one of his production lines was producing bulbs 
with a high variation in length of life. To test this theory, he compared the lengths of life of 
n — 50 bulbs randomly sampled from the suspect line and n — 50 from a line that seemed 
to be in control. The sample means and variances for the two samples were as shown in the 
following table. 


Suspect Line Line in Control 
y, = 1,520 y, = 1,476 
52 = 92,000 52 = 37,000 


а Do the data provide sufficient evidence to indicate that bulbs produced by the suspect line 
possess a larger variance in length of life than those produced by the line that is assumed 
to be in control? Use о = .05. 


b Find the approximate observed significance level for the test and interpret its value. 
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A pharmaceutical manufacturer purchases a particular material from two different suppliers. 
The mean level of impurities in the raw material is approximately the same for both suppliers, 
but the manufacturer is concerned about the variability of the impurities from shipment to 
shipment. If the level of impurities tends to vary excessively for one source of supply, it 
could affect the quality of the pharmaceutical product. To compare the variation in percentage 
impurities for the two suppliers, the manufacturer selects ten shipments from each of the two 
suppliers and measures the percentage of impurities in the raw material for each shipment. The 
sample means and variances are shown in the accompanying table. 


Supplier A Supplier B 
y; = 1.89 Уз = 1.85 
у ==273 52 = .094 
пу = 10 n = 10 


a Do the data provide sufficient evidence to indicate a difference in the variability of the 


shipment impurity levels for the two suppliers? Test using © = .10. Based on the re- 
sults of your test, what recommendation would you make to the pharmaceutical manu- 
facturer? 


b Find a 90% confidence interval for og and interpret your results. 


The data in the following table give readings in foot-pounds of the impact strength of two kinds 
of packaging material, type A and type B. Determine whether the data suggests a difference in 
mean strength between the two kinds of material. Test at the о = .10 level of significance. 


A B 
1.25 89 
1.16 1.01 
1.33 97 
1.15 95 
1.23 94 
1.20 1.02 
1.32 .98 
1.28 1.06 
1.21 98 
Y» = 11.13 У`ур=8.80 
y = 1.237 у= .978 


3X = 13.7973 У у? = 8.6240 


How much combustion efficiency should a homeowner expect from ап oil furnace? The EPA 
states that 80% or higher is excellent, 75% to 79% is good, 70% to 74% is fair, and below 70% 
is poor. A home-heating contractor who sells two makes of oil heaters (call them A and B) 
decided to compare their mean efficiencies by analyzing the efficiencies of 8 heaters of type 
A and 6 of type B. The resulting efficiency ratings in percentages for the 14 heaters are shown 
in the accompanying table. 
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Type A Type В 


72 78 
78 76 
73 81 
69 74 
79 82 
74 75 
69 

75 


а Do the data provide sufficient evidence to indicate a difference in mean efficiencies for 
the two makes of home heaters? Find the approximate p-value for the test and interpret its 
value. 


b Find a 90% confidence interval for (шд — ив) and interpret the result. 


Suppose that X1, X2, ..., X,,, Yi, Yo, ..., Yn, and №, W2,..., Wn, are independent random 


samples from normal distributions with respective unknown means 41, 42, and из and common 


variances o? = 02 = 02 = o°. Suppose that we want to estimate a linear function of the means: 


Ө = аш + аи + азиз. Because the maximum-likelihood estimator (MLE) of a function 
of parameters is the function of the MLEs of the parameters, the MLE of 0 is 0 = а X + 
aY + a3W. 
a Whatis the standard error of the estimator 0? 
b What is the distribution of the estimator Ô? 
с Ifthe sample variances are given by S?, S2, and S2, respectively, consider 
(пу — DS + (п — DS; + (пз — DS: 
ni +n + пз = З ` 
i What is the distribution of (л + na + пз — 3)57/o7? 
ii What is the distribution of 


PUN 
5, = 


0—0 
T= ? 
2 2 2 
а а а 
1 2 3 
Sj ++ — 
nı n» пз 


Give a confidence interval for 0 with confidence coefficient 1 — o. 
Develop a test for Ho :0 = 00 versus H,:0 4 0o. 


A merchant figures her weekly profit to be a function of three variables: retail sales (denoted 
by X), wholesale sales (denoted by Y), and overhead costs (denoted by W). The variables 
X, Y, and W are regarded as independent, normally distributed random variables with means 
ш, шо, and из and variances о?, ao’, and bo’, respectively, for known constants a and b 
but unknown o°. The merchant's expected profit per week is jz; + шо — из. If the merchant 
has made independent observations of X, Y, and W for the past n weeks, construct a test of 
Ao: i + цо — из = К against the alternative H, : ш + ио — из + k, for a given constant k. 
You may specify о = .05. 


A reading exam is given to the sixth graders at three large elementary schools. The scores 


on the exam at each school are regarded as having normal distributions with unknown means 


| ; Oy. 3c ИЕМЕ 7 : 
Hı, H2, and из, respectively, and unknown common variance o“ (of = 05 = 0; = o^). Using 
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the data in the accompanying table on independent random samples from each school, test to 
see if evidence exists of a difference between ш and u2. Use о = .05. 


School I School II School III 

n; = 10 n; = 10 пз = 10 
x? = 36,950 > y? = 25,850 Y w? = 49,900 

x = 60 У = 50 w= 70 


Suppose that У, У›,.. 
given by 


., Y, denote a random sample from the probability density function 


1 
7010,6) = (s 


0, elsewhere. 


) e 0782)/01 у> 6, 


Find the likelihood ratio test for testing Ho : 0; = 01.0 versus H, : 6, > 01, with 62 unknown. 


Refer to Exercise 10.129. Find the likelihood ratio test for testing Ho : 0» = 62,9 versus H; : 0 > 
6,9, with 0; unknown. 
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FIGURE 11.1 
Plot of data 
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Introduction 


In Chapter 9, we considered several methods for finding estimators of parameters, 
including the methods of moments and maximum likelihood and also methods based 
on sufficient statistics. Another method of estimation, the method of least squares, is 
the topic of this chapter. 

In all our previous discussions of statistical inference, we assumed that the observ- 
able random variables Y;, Y2,..., Y, were independent and identically distributed. 
One implication of this assumption is that the expected value of Y;, E (Y;), is constant 
Gf it exists). That is, E(Y;) = и does not depend on the value of any other variables. 
Obviously, this assumption is unrealistic in many inferential problems. For example, 
the mean stopping distance for a particular type of automobile will depend on the 
speed that the automobile is traveling; the mean potency of an antibiotic depends on 
the amount of time that the antibiotic has been stored; the mean amount of elongation 
observed in a metal alloy depends on the force applied and the temperature of the 
alloy. In this chapter, we undertake a study of inferential procedures that can be used 
when a random variable Y, called the dependent variable, has a mean that is a func- 
tion of one or more nonrandom variables x1, x2, ..., Xg, called independent variables. 
(In this context, the terms independent and dependent are used in their mathematical 
sense. There is no relationship with the probabilistic concept of independent random 
variables.) 

Many different types of mathematical functions can be used to model a response 
that is a function of one or more independent variables. These can be classified into 
two categories: deterministic and probabilistic models. For example, suppose that y 
and x are related according to the equation 


y = fo + fix, 


where Во and В are unknown parameters. This model is called a deterministic math- 
ematical model because it does not allow for any error in predicting y as a function 
of x. This model implies that y always takes the value fo + £1 (5.5) whenever x = 5.5. 

Suppose that we collect a sample of n values of y corresponding to п different 
settings of the independent variable x and that a plot of the data is as shown in 
Figure 11.1. It is quite clear from the figure that the expected value of Y may increase 
as a linear function of x but that a deterministic model is far from an adequate 


200 [- 


100 — 


FIGURE 11.2 
Graph of the 
probabilistic model 
Y = bo + Bixte 
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description of reality. Repeated experiments when x = 5.5 would yield values of 
Y that vary in a random manner. This tells us that the deterministic model is not 
an exact representation of the relationship between the two variables. Further, if the 
model were used to predict Y when x = 5.5, the prediction would be subject to some 
unknown error. This, of course, leads us to the use of statistical methods. Predicting 
Y for a given value of x is an inferential process. If the prediction is to be of value in 
real life we need to be able to assess the likelihood of observing prediction errors of 
various magnitudes. 

In contrast to the deterministic model, statisticians use probabilistic models. For 
example, we might represent the responses of Figure 11.1 by the model 


E(Y) = Bo + Bix 
or, equivalently, 
Y = fo + ix + € 


where є is a random variable possessing a specified probability distribution with 
mean 0. We think of Y as the sum of a deterministic component E (Y) and a ran- 
dom component £. This model accounts for the random behavior of Y exhibited in 
Figure 11.1 and provides a more accurate description of reality than the deterministic 
model. Further, the properties of the error of prediction for Y can be derived for many 
probabilistic models. 

Figure 11.2 presents a graphical representation of the probabilistic model Y = By+ 
Bix +e. When x = 5.5, there is a population of possible values of Y . The distribution 
of this population is indicated on the main portion of the graph and is centered on 
the line E(Y) = fo + fix at the point x = 5.5. This population has a distribution 
with mean Во + В; (5.5) and variance o?, as shown in the magnified version of the 


y (^ 


| 
Bo + B16.5) y 
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distribution that is boxed in Figure 11.2. When x — 7, there is another population of 
possible values for Y. The distribution of this population has the same form as the 
distribution of Y -values when x = 5.5 and has the same variance o?, but when x = 7, 
the distribution of Y has mean f + В; (7). The same is true for each possible value 
of the independent variable x. That is, in a regression model, a separate population 
of response values exists for each possible setting of the independent variable(s). 
These populations all have the same variance, and the shape of the distributions of the 
populations are all the same (see Figure 11.2); however, the mean of each population 
depends, through the regression model, on the setting of the independent variable(s). 
Scientific and mathematical textbooks are filled with deterministic models of reality. 
Indeed, many of the mathematical functions that appear in calculus and physics books 
are deterministic mathematical models of nature. For example, Newton's law relating 
the force of a moving body to its mass and acceleration, 


F —ma, 


is a deterministic model that, for practical purposes, predicts with little error. In con- 
trast, other models—such as functions graphically represented in scientific journals 
and texts—are often poor. The spatter of points that would give graphic evidence 
of their inadequacies, similar to the random behavior of the points in Figure 11.1, 
has been de-emphasized, which leads novice scientists to accept the corresponding 
“laws” and theories as an exact description of nature. 

If deterministic models can be used to predict with negligible error, for all practical 
purposes, we use them. If not, we seek a probabilistic model, which will not be an 
exact characterization of nature but which will enable us to assess the validity of our 
inferences. 


Linear Statistical Models 


Although infinitely many different functions can be used to model the mean value of 
the response variable Y as a function of one or more independent variables, we will 
concentrate on a set of models called linear statistical models. If Y is the response 
variable and x is a single independent variable, it may be reasonable in some situations 
to use the model E(Y) = Во + Вх for unknown parameter values Во апа В. Notice 
that in this model E(Y) is a linear function of x (for a given Во and Ві) and also a 
linear function of Во and В, [because E(Y) = chy + df, with c = 1 and d = x]. In 
the model E(Y) = Во + Bix?, E(Y) is not a linear function of x, but it is a linear 
function of Во and В [because E(Y) = chy + df, with c = 1 and d = x?]. When we 
say we have a linear statistical model for Y, we mean that E (Y) is a linear function of 
the unknown parameters Во and В; and not necessarily a linear function of x. Thus, 
Y = fo + Bi (In x) + = is a linear model (because In x takes on known values for each 
fixed value of x). 

If the model relates E(Y) as a linear function of Во and В only, the model is 
called a simple linear regression model. If more than one independent variable—say, 
X1, X2, ..., Xy —are of interest and we model E(Y) by 


E(Y) = Bo + Bixi t c Врх, 


FIGURE 11.3 
Plot of E(Y) = 
Bo + Bix + 2X2 
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Е(Ү) = Bo + Bix, + бух) 


the model is called a multiple linear regression model. Because x1, x2, ..., xy are 
regarded as variables with known values, they are assumed to be measured without 
error in an experiment. For example, if you think that the mean yield E(Y) is a 
function of the variable г, the temperature of a chemical process, you might let x; = t 
and x2 = е! and use the model E(Y) = Во + B1x1 + f2x» or, equivalently, E(Y) = 
Po + Bit + Boe’. Or, if E(Y) is a function of two variables x; and x2, you might 
choose a planar approximation to the true mean response, using the linear model 
E(Y) = Во + Вх + fjB axo. Thus, E(Y) is a linear function of Во, #1, and f; and 
represents a plane in the y, x1, x2 space (see Figure 11.3). Similarly, 


E(Y) = Bo + Bix + fox? 


is a linear statistical model, where E (Y) is a second-order polynomial function of the 
independent variable x, with x; = x and x; = x?. This model would be appropriate 
for a response that traces a segment of a parabola over the experimental region. 

The expected percentage E(Y) of water in paper during its manufacture could be 
represented as a second-order function of the temperature of the dryer, xı, and the 
speed of the paper machine, x2. Thus, 


E(Y) = Bo + Bix: + faxa + Baxixa + Bax? + В5х2, 


where Во, Ві, ..., fs are unknown parameters in the model. Geometrically, E(Y) 
traces a second-order (conic) surface over the x1, x» plane (see Figure 11.4). 
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FIGURE 11.4 
Plot of E(Y) = 

Bo + Вх + Вж + 
B3X1X2 + Bax, + BsX, 
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yA E(Y) = Bo + Byxi + Box, + Взхуху + Вахт + Bs 


A linear statistical model relating a random response Y to a set of independent 
variables x1, xo, ... , xy is of the form 


Y = Bo + Bix + Boxe +--- Вх +8, 


where fo, Ві, .-., By are unknown parameters, = is a random variable, and the 
variables x1, x2, ..., xy assume known values. We will assume that E(e) = 0 
and hence that 


E(Y) = Bo + pixi + Boxe - - + Bex. 


Consider the physical interpretation of the linear model Y. It says that Y is equal 
to an expected value, Во + Вуху + Вох +--+ + Вх (a function of the independent 
variables x1, хә, ..., Хк), plus a random error e. From a practical point of view, € 
acknowledges our inability to provide an exact model for nature. In repeated experi- 
mentation, Y varies about E(Y) in a random manner because we have failed to include 
in our model all of the many variables that may affect Y . Fortunately, many times the 
net effect of these unmeasured, and most often unknown, variables is to cause Y to 
vary in a manner that may be adequately approximated by an assumption of random 
behavior. 

In this chapter, we use the method of least squares to derive estimators for the 
parameters Во, В, ..., В; in a linear regression model. In many applications, one 
or more of these parameters will have meaningful interpretations. For this reason, 
we develop inferential methods for an individual 6 parameter and for sets of 6 pa- 
rameters. If we estimate the parameters Во, В, ..., 85 in the model expressing the 
expected percentage E (Y) of water in paper as a second-order polynomial in x, (the 
dryer temperature) and x» (the dryer speed), we will be able to develop methods for 
estimating and forming confidence intervals for the value of E (Y) when x; and хә take 
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FIGURE 11.5 
Fitting a straight 
line through a 
set of data points 
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on specific values. Similarly, we can develop methods for predicting a future value of 
Y when the independent variables assume values of practical interest. Sections 11.3 
through 11.9 focus on the simple linear regression model whereas the later sections 
deal with multiple linear regression models. 


The Method of Least Squares 


A procedure for estimating the parameters of any linear model—the method of least 
squares—can be illustrated simply by fitting a straight line to a set of data points. 
Suppose that we wish to fit the model 


E(Y) = Bo + Bix 


to the set of data points shown in Figure 11.5. [The independent variable x could be 
w? or (w)!/? or In w, and so on, for some other independent variable w.] That is, we 
postulate that Y = В+ Вх +e, where = possesses some probability distribution with 
Е(є) = 0. If Bo and f, are estimators of the parameters Во and В, then Y = Bo fix 
is clearly an estimator of E(Y). 

The least-squares procedure for fitting a line through a set of п data points is 
similar to the method that we might use if we fit a line by eye; that is, we want the 
differences between the observed values and corresponding points on the fitted line to 
be “small” in some overall sense. A convenient way to accomplish this, and one 
that yields estimators with good properties, is to minimize the sum of squares of the 
vertical deviations from the fitted line (see the deviations indicated in Figure 11.5). 
Thus, if 


$i = Bo + Pixi 
is the predicted value of the ith y value (when x = xj), then the deviation (sometimes 


called the error) of the observed value of y; from $; — Bo + fixi is the difference 
у — 3j and the sum of squares of deviations to be minimized is 


SSE = У O; — $0? = У у — Bo + ix). 
i-1 


i=1 


Yi 
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The quantity SSE is also called the sum of squares for error for reasons that will 
subsequently become apparent. 

If SSE possesses a minimum, it will occur for values of Во and £; that satisfy the 
equations, 9SSE/8 ĝo = 0 and 9SSE/3Ó, = 0. Taking the partial derivatives of SSE 
with respect to до and f, and setting them equal to zero, we obtain 


dSSE ә [Y yi — (Êo + Bix) PF} - 5 LB 
= = x = 2 = + i 
T эй ) [yi — (Bo + Bixi)] 
--(Y n= nBo~ AY DE 
and 

aSSE — 8 [Yo abi — (Bo + Bix} Е с 
== = 2[у 11 

T3 T) " [yi = (Bo + Bixi) |x 


= —2 E — Bo xi — fi £2 = 0. 
i=l i=l i=l 


The equations 255Е/ д0 = 0 and 255Е/4, = 0 are called the least-squares equations 
for estimating the parameters of a line. 

The least-squares equations are linear in до and f, and hence can be solved 
simultaneously. You can verify that the solutions are 


Уа =x); — У) P DNO 


Further, it can be shown that the simultaneous solution for the two least-squares 
equations yields values of до and f, that minimize SSE. We leave this for you to 
prove. 

The expressions 


$505 -3)0-» ad $ œo 
i-l i=l 


that are used to calculate f, are often encountered in the development of simple linear 
regression models. The first of these is calculated by summing products of x-values 
minus their mean and y-values minus their mean. In all subsequent discussions, we 
will denote this quantity by S,,. Similarly, we will denote the second quantity by Sxx 
because it is calculated by summing products that involve only the x-values. 
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^ у - 5 = 
1. Bj = —”, where See = yc = x)(y; — y) and $,, = У (к; — xy. 


i=l 


We illustrate the use of the preceding equations with a simple example. 


EXAMPLE 11.1 Use the method of least squares to fit a straight line to the n = 5 data points given in 
Table 11.1. 


Table 11.1 Data for Example 11.1 


x y 
=2 0 
-1 0 

0 1 

1 1 

2 3 


Solution We commence computation of the least-squares estimates for the slope and intecept 
of the fitted line by constructing Table 11.2. Using the results from the table, we obtain 


n 1” п 
а) qM 7- £006) 


wry 


A $ху 
Pi B Sex n 1 n 2 10 1 0 2 
——— | s) B sí ) 
n Vi 
i = Roe D 
bo =Y – Pix = 57 (.7)(0) = 1, 
and the fitted line is 


y=1l+.7x. 


Table 11.2 Calculations for finding the coefficients 


Xi Yi Xi Yi Xi 

-2 0 0 4 

—1 0 0 1 

0 1 0 0 

1 1 1 | 

2 3 6 4 
Уол =0 Via = 5 Уау =7 Ха х? = 10 
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FIGURE 11.6 

Plot of data points 
and least-squares line 
for Example 11.1 
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The five points and the fitted line are shown in Figure 11.6. ш 


In this section, we have determined the least-squares estimators for the parameters 
Bo and f, inthe model E(Y) = Во + Вх. The simple example used here will reappear 
in future sections to illustrate other calculations. Exercises of a more realistic nature 
are presented at the ends of the sections, and two examples involving data from actual 
experiments are presented and analyzed in Section 11.9. In the next section, we de- 
velop the statistical properties of the least-squares estimators By and f. Subsequent 
sections are devoted to using these estimators for a variety of inferential purposes. 


Exercises 


If Bo and A, are the least-squares estimates for the intercept and slope in a simple linear 
regression model, show that the least-squares equation $ = Bo + B1x always goes through 
the point (x, y). [Hint: Substitute x for x in the least-squares equation and use the fact that 


Bo = у – Віх.] 


Applet Exercise How сап you improve your understanding of what the method of least-squares 
actually does? Access the applet Fitting a Line Using Least Squares (at www.thomsonedu. 
com/statistics/wackerly). The data that appear on the first graph is from Example 11.1. 


a What are the slope and intercept of the blue horizontal line? (See the equation above the 
graph.) What is the sum of the squares of the vertical deviations between the points on 
the horizontal line and the observed values of the y's? Does the horizontal line fit the data 
well? Click the button “Display/Hide Error Squares.” Notice that the areas of the yellow 
boxes are equal to the squares of the associated deviations. How does SSE compare to the 
sum of the areas of the yellow boxes? 

b Click the button “Display/Hide Error Squares" so that the yellow boxes disappear. Place 
the cursor on right end of the blue line. Click and hold the mouse button and drag the line 
so that the slope of the blue line becomes negative. What do you notice about the lengths 
of the vertical red lines? Did SSE increase of decrease? Does the line with negative slope 
appear to fit the data well? 

с Drag the line so that the slope is near 0.8. What happens as you move the slope closer to 
0.7? Did SSE increase or decrease? When the blue line is moved, it is actually pivoting 
around a fixed point. What are the coordinates of that pivot point? Are the coordinates of 
the pivot point consistent with the result you derive in Exercise 11.1? 
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d Drag the blue line until you obtain a line that visually fits the data well. What are the slope 
and intercept of the line that you visually fit to the data? What is the value of SSE for 
the line that you visually fit to the data? Click the button “Find Best Model” to obtain the 
least-squares line. How does the value of SSE compare to the SSE associated with the line 
that you visually fit to the data? How do the slope and intercept of the line that you visually 
fit to the data compare to slope and intercept of the least-squares line? 


Fit a straight line to the five data points in the accompanying table. Give the estimates of Bo 
and f. Plot the points and sketch the fitted line as a check on the calculations. 


y | 3.0 20 10 10 0.5 


x | —20 -10 00 10 2.0 


Auditors are often required to compare the audited (or current) value of an inventory item with 
the book (or listed) value. If a company is keeping its inventory and books up to date, there 
should be a strong linear relationship between the audited and book values. A company sampled 
ten inventory items and obtained the audited and book values given in the accompanying table. 
Fit the model Y = Во + Вх + = to these data. 


Item Audit Value (y) Book Value (x;) 


1 9 10 
2 14 12 
3 7 9 
4 29 27 
3 45 47 
6 109 112 
7 40 36 
8 238 241 
9 60 59 
10 170 167 


a What is your estimate for the expected change in audited value for a one-unit change in 
book value? 


b If the book value is x = 100, what would you use to estimate the audited value? 


What did housing prices look like in the “good old days"? The median sale prices for new 
single-family houses are given in the accompanying table for the years 1972 through 1979.! 
Letting Y denote the median sales price and x the year (using integers 1, 2, . . . , 8), fit the model 
Y = Во + Bix + £. What can you conclude from the results? 


Year Median Sales Price (x 1000) 


1972 (1) $27.6 
1973 (2) $32.5 
1974 (3) $35.9 
1975 (4) $39.3 
1976 (5) $44.2 
1977 (6) $48.8 
1978 (7) $55.7 
1979 (8) $62.9 


1. Source: Adapted from Time, 23 July 1979, p. 67. 
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Applet Exercise Refer to Exercises 11.2 and 11.5. The data from Exercise 11.5 appear in the 
graph under the heading "Another Example" in the applet Fitting a Line Using Least Squares. 
Again, the horizontal blue line that initially appears on the graph is a line with 0 slope. 


a What is the intercept of the line with 0 slope? What is the value of SSE for the line with 0 
slope? 

b Do you think that a line with negative slope will fit the data well? If the line is dragged to 
produce a negative slope, does SSE increase or decrease? 

с Drag the line to obtain a line that visually fits the data well. What is the equation of the 
line that you obtained? What is the value of SSE? What happens to SSE if the slope (and 
intercept) of the line is changed from the one that you visually fit? 

d Istheline that you visually fit the least-squares line? Click on the button “Find Best Model" 
to obtain the line with smallest SSE. How do the slope and intercept of the least-squares 
line compare to the slope and intercept of the line that you visually fit in part (c)? How do 
the SSEs compare? 


e Refer to part (a). What is the y-coordinate of the point around which the blue line pivots? 


f Click on the button “Display/Hide Error Squares.” What do you observe about the size of the 
yellow squares that appear on the graph? What is the sum of the areas of the yellow squares? 


Applet Exercise Move down to the portion of the applet labeled “Curvilinear Relationship” 
associated with the applet Fitting a Line Using Least Squares. 


a Does it seem like a straight line will provide a good fit to the data in the graph? Does it 
seem that there is likely to be some functional relationship between E(Y) and x? 


Is there any straight line that fits the data better than the one with 0 slope? 


If you fit a line to a data set and obtain that the best fitting line has 0 slope, does that mean 
that there is no functional relationship between E (Y) and the independent variable? Why? 


Laboratory experiments designed to measure LC50 (lethal concentration killing 50% of the test 
species) values for the effect of certain toxicants on fish are run by two different methods. One 
method has water continuously flowing through laboratory tanks, and the other method has 
static water conditions. For purposes of establishing criteria for toxicants, the Environmental 
Protection Agency (EPA) wants to adjust all results to the flow-through condition. Thus, a 
model is needed to relate the two types of observations. Observations on toxicants examined 
under both static and flow-through conditions yielded the data in the accompanying table 
(measurements in parts per million, ppm). Fit the model Y = f + Вх + €. 


Toxicant | LC50 Flow-Through (y) 1С50 Static (x) 


1 23.00 39.00 
2 22.30 37.50 
3 9.40 22.20 
4 9.70 17.50 
5 15 .64 
6 .28 45 
7 „ТЭ 2.62 
8 21 2.36 
О 28.00 32.00 
10 39 77 


а What interpretation can you give to the results? 


b Estimate the flow-through value for a toxicant with an LC50 static value of x = 12 ppm. 
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Information about eight four-cylinder automobiles judged to be among the most fuel efficient 
in 2006 is given in the following table. Engine sizes are in total cylinder volume, measured in 
liters (L). 


Car Cylinder Volume (x) | Horsepower (y) 
Honda Civic 1.8 51 
Toyota Prius 1.5 51 
VW Golf 2.0 115 
VW Beetle 2.5 150 
Toyota Corolla 1.8 126 
VW Jetta 2,5 150 
Mini Cooper 1.6 118 
Toyota Yaris 1.5 106 
a Plot the data points on graph paper. 
b Find the least-squares line for the data. 
c Graph the least-squares line to see how well it fits the data. 
d Use the least-squares line to estimate the mean horsepower rating for a fuel-efficient auto- 


mobile with cylinder volume 1.9 L. 
Suppose that we have postulated the model 
Y; = Bix; + е; = 1.2.0.0; 


where the ¢;’s are independent and identically distributed random variables with Е (=;) = 0. 
Then ў = fix; is the predicted value of y when x = x; and SSE = У [у — fixi. Find 
the least-squares estimator of 6,. (Notice that the equation у = Вх describes a straight line 
passing through the origin. The model just described often is called the no-intercept model.) 


Some data obtained by C. E. Marcellari? on the height x and diameter y of shells appear in the 
following table. If we consider the model 


E(Y) = fix, 


then the slope f is the ratio of the mean diameter to the height. Use the following data and 
the result of Exercise 11.10 to obtain the least-squares estimate of the mean diameter to height 
ratio. 


Specimen Diameter (y) Height (x) 


OSU 36651 185 78 
OSU 36652 194 65 
OSU 36653 173 TI 
OSU 36654 200 76 
OSU 36655 179 72 
OSU 36656 213 76 
OSU 36657 134 75 
OSU 36658 191 TI 
OSU 36659 177 69 
OSU 36660 199 65 


2. Source: Carlos E. Marcellari, "Revision of Serpulids of the Genus Rotularia (Annelida) at Seymour 
Island (Antarctic Peninsula) and Their Value in Stratigraphy,” Journal of Paleontology 58(4) (1984). 
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11.12 Processors usually preserve cucumbers by fermenting them in a low-salt brine (6% to 9% 
sodium chloride) and then storing them in a high-salt brine until they are used by processors to 
produce various types of pickles. The high-salt brine is needed to retard softening of the pickles 
and to prevent freezing when they are stored outside in northern climates. Data showing the 
reduction in firmness of pickles stored over time in a low-salt brine (2% to 3%) are given in 
the accompanying table. 


Weeks (x) in Storage at 72°F 
0 4 14 32. 52 


Firmness (y) in pounds 19.8 165 12.8 81 7.5 


a Fitaleast-squares line to the data. 


b As a check on your calculations, plot the five data points and graph the line. Does the line 
appear to provide a good fit to the data points? 


с Use the least-squares line to estimate the mean firmness of pickles stored for 20 weeks. 


Text not available due to copyright restrictions 


11.14 J. H. Matis and T. E. Wehrly? report the following table of data on the proportion of green 
sunfish that survive a fixed level of thermal pollution for varying lengths of time. 


Proportion of Survivors (y) Scaled Time (x) 


1.00 .10 
.95 .15 
95 .20 
90 25, 
‚85 30 
70 3) 
.65 .40 
.60 45 
55 50 
40 :99 


а Fit the linear model Y = fo + Вх + =. Give your interpretation. 
b Plot the points and graph the result of part (a). Does the line fit through the points? 


3. Source: R. W. Buescher, J. M. Hudson, J. R. Adams, and D. H. Wallace, “Calcium Makes It Possible 


to Store Cucumber Pickles in Low-Salt Brine,’ Arkansas Farm Research 30(4) (1981). 


Text not available due to copyright restrictions 


5. Source: J. H. Matis and T. E. Wehrly, “Stochastic Models of Compartmental Systems,” Biometrics 35(1) 
(1979): 199-220. 
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Properties of the Least-Squares Estimators: 
Simple Linear Regression 


We need to determine the statistical properties of least-squares estimators if we wish to 
use them to make statistical inferences. In this section, we show that the least-squares 
estimators Во and f, for the parameters in the simple linear model 


Y = Boc ix += 


are unbiased estimators of their respective parameter values. We also derive the vari- 
ances of these estimators and, under the assumption that the error term є is normally 
distributed, show that до and д, have normal sampling distributions. Corresponding 
results applicable to the multiple linear regression model are presented without proof 
in Section 11.11. 

Recall that = was previously assumed to be a random variable with Е (=) = 0. We 
now add the assumption that V (e) = o?. That is, we are assuming that the difference 
between the random variable Y and E(Y) = Во + Вх is distributed about zero with 
a variance that does not depend on x. Notice that V(Y) = V(e) = c? because the 
other terms in the linear model are constants. (An unbiased estimator for the variance 
с? of the error term in the model is also provided in this section.) 

Assume that n independent observations are to be made on this model so that 
before sampling we have n independent random variables of the form 


Y; = Bo + Bixi + £i. 
From Section 11.3, we know that 


A __ Syy = X cat = x)(Yi – Y) 
Sx Mia ~ x)? 


which can be written as 


Yoe- 3)Y; - УУУ (0; – X) 


Bi = F 


Then, because У (х; — X) = 0, we have 


fi = Yi = x)Yi 
Sxx 
Because all summations in the following discussion will be summed from i — 1 to 
n, we will simplify our notation by omitting the variable of summation and its index. 
Now let us find the expected value and variance of £4. 
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From the expectation theorems developed in Section 5.8, we have 


E(B) = Е Б - 2r | Хо nan 
X YOu = XY(fo + бух) 
Г Ss 
= po x) + В v x)xi | 
Because Y (x; — X) = 0 and $4, = У (ху — X = (aj — X)x;, we have 
б Syx 
E(B1) =0+ fi © = f. 


Thus, f, is an unbiased estimator of fj. 
To find V (Bi), we use Theorem 5.12. Recall that Y;, Y2,..., Y, are independent 
and, therefore, 


^ = Ж), 17 Е 
vip = у |2, a |=[; | Y Vie - Yl 


1 2 
= E | У `0; -3yva. 


Because V(Y;) = o”, fori = 1,2,...,n, 


c? 


V(Bi) = СИҢ. 


Now let us find the expected value and variance of Bo, where By = Y — @үх. From 
Theorem 5.12, we have 


V(Bo) = VO) + ¥°V(B1) — 2xCov(Y, д1). 


Consequently, we must find V(Y) and Cov(Y, B 1) in order to obtain V (Bo). Because 
Y; = Bo + Pix; + ej, we see that 
= | 
Y=- i= x-E. 
г » Bo + Bix + € 
Thus, 
E(Y) = Bo + fix + E(£) = Bot fix, 


and 
— _ 1 o? 
V(Y) = V€) = |- |V(e:) = —. 
n n 
To find Cov(Y, B), rewrite the expression for fi as 


Ё. = Xo cy, 


where 
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(Notice that ` c; = 0.) Then, 


Cov(Y, B1) = ex: Ө Y Lar | ; 
and using Theorem 5.12, 
соу, й) = У (=)у@ш+ УУ (2) сок, vo. 
izj 


Because Y; and Y;, wherei Æ j,areindependent, Cov(Y;, Y;) = 0. Also, V(Y;) = о? 
and, hence, 


2 2 > 
= А o с xj; = xX 
Соу(Ү, = р: = 
Ф.У - - Y (=) 
Returning to our original task of finding the expected value and variance of 
Bo — Y — fix, we apply expectation theorems to obtain 


Е(Во) = E(Y) — E(B1)x = Bo + fix — BiX = Во. 
Thus, we have shown that both Ёо and f, are unbiased estimators of their respective 
parameters. _ _ 
Because we have derived V(Y), V (Bi), and Cov(Y, В), we аге ready to find 
V (Bo). As previously established by using Theorem 5.12, 
V (Bo) = VY) + x°V(B1) — 2xCow(Y, б). 


Substituting the values for V), V (Bi), and Cov(Y, Bi), we obtain 


1 = 2 2 
NUN 
п S nS. 


Further (see Exercise 11.21), Theorem 5.12 can be employed to show that 
—XxG 
Notice that Йо and f, are correlated (and therefore dependent) unless x = 0. 

АП the quantities necessary to determine the values of the variances and covari- 


ances above have already been calculated in the course of obtaining the values for Bo 
and f,. 


Cov (Bo. Ё) == 


EXAMPLE 11.2 


Solution 


Find the variances of the estimators By and f, for Example 11.1. 


In Example 11.1 (see the calculations for the denominator of B 1), we found that 


n=5 Yee Yaa, 7510) 


It follows that x = 0, 


V(Bo) = 


g^» x? . 6710) 1\ , 
= = [o um 
nS, 5(10) 5 
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and 


ve z-le)” 
В) = 5- = 10 о. 


Notice that Cov(Bo, B) = 0 in this case since У x; = 0. L1 


The preceding expressions give the variances for the least-squares estimators in 
terms of o?, the variance of the error term ғ. Usually the value of o? is unknown, and 
we will need to make use of the sample observations to estimate o?. If Y is used to 
estimate the mean, we previously used 


1 n КЕ 
==. y JT 
(A) iv- 
ї=1 
to estimate the population variance o?. Because we are now using Ў; to estimate 


E(Y;), it seems natural to base an estimate of o? on SSE = 2m (Y; — Y. Indeed, 
we will show that 


- 1 п — 1 
s- (=) Ene C) s 


provides an unbiased estimator for o?. Notice that the 2 occurring in the denominator 
of S? corresponds to the number of f parameters estimated in the model. 


Because 
5 1 1 
Е(5°) = Е (= z) ss | = (=) E(SSE), 


it is necessary to find E (SSE) in order to verify that E(S?) = 0°. 


Notice that 
Eam - |= Е | у - Bo - Ais] 

= [у - Y Aix - is] 

= E|) Iv- 5 - Ао - f] 


= E|- +E- - 28: (ж - 9a - Y)] 


Because У (х; — XY(Y; — Y) = У (x; — X)? f,, the last two terms in the expectation 
combine to give -g Y Gi — x)’. Also, 


Di -Y'-yx-aY, 


E(SSE) = 


and, therefore, 
Elm - у] = Е a Y? -»Y' - fis. 
= Y £(Y2) - nE(Y)) – SE (83). 
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Noting that, for any random variable U, E(U?) = V(U) + [Е(0)], we see that 
E[ om - #2 |= ova + ta) - tva + EMP) 
-Sa {V (ÊD + IE (ÊD) 
2 
= no? +) `(% + pixi)? — n E + (Bo + e 


o? 2 
—$хх у + fi E 


This expression simplifies to (n — 2)o?. Thus, we find that an unbiased estimator of 


с? is given by 
1 " 1 
2 2 
= у Y; — Ү;) = | — Е. 
á (=) ( ) (=) ii 


One task remains, finding an easy way to calculate ? (y; — $;)? = SSE. In Exercise 
11.15(a), you will show that a computing formula for SSE is given by 


SSE = У (у -y - il о – Di у) 
1=1 1=1 


= Syy — Ё15., where Syy = y» 2 


i=l 


EXAMPLE 11.3 


Solution 


Estimate o? from the data given in Example 11.1. 


For these data, n = 5 and we have already determined that 
X y =5, Sry =7, Bi =.7. 
It is easily determined that У’ y? = 11 and that 
Sy = 5» 0i - 3 = уу п) = 11-50) = 6.0. 


Therefore, 
SSE = Syy — 1$, = 60 — (.7)(7) = 1.1, 
and 
E 1.1 id 
52 = E = = .367. 
nes 5-2 3 Oo 


These derivations establish the means and variances of the estimators Êo and f 
and show that 52 = SSE/(n — 2) is an unbiased estimator for the parameter o”. Thus 
far, the only assumptions that we have made about the error term € in the model 
Y = Bo + Bix + € is that E(e) = 0 and that V (e) = o°, independent of x. The form 
of the sampling distributions for By and f, depends on the distribution of the error 
term є. Because of the common occurrence of the normal distribution in nature, it is 


often reasonable to assume that ¢ is normally distributed with mean 0 and variance o?. 
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If this assumption of normality is warranted, it follows that Y; is normally distributed 

with mean Во + £x; and variance c ?. Because both до and f, are linear functions of 

Yi, Yo, ..., Yn, the estimators are normally distributed, with means and variances as 

previously derived. Further, if the assumption of normality is warranted, it follows that 
(n—2)S? SSE 

2 2 


o o 


has a x? distribution with n — 2 degrees of freedom (df). (The proof of this result is 
omitted.) 

As you will subsequently see, the assumption of normality of the distribution of 
the error term = and the resulting normal distributions for Bo and B, will allow us to 
develop tests and confidence intervals based on the f distribution. The results of this 
section are summarized here because of their importance to discussions in subsequent 
sections. Notice that V (Bo), V (B), and Cov(Bo, В) are all constant multiples of o°’. 
Because V (B;) = Cov(B;, B i), we will unify notation and provide consistency with 
the later sections of this chapter if we use the notation V (Bo) = co’, V (Bi) = 


спо?, and Cov(Bo, B1) = co10°. 


Properties of the Least-Squares Estimators; Simple Linear Regression 
1. The estimators Bo and fi are unbiased—that is, E (B;) = Jens fori = 0, 1. 
2. V(Bo) = cooa?, where coo = У) x2/(n$,4). 


3. V(B) = спо?, where сүү = 


XX 
m 


4. Cov(Bo, P1) = coio, where co, = 
AA 


5. An unbiased estimator of o? is 52 = SSE/(n — 2), where SSE = 
Syy — Êi Sxy and Syy = x (у; — у)”. 


If, in addition, the є;, fori = 1, 2, ...,n are normally distributed, 


6. Both Bo and B 1 are normally distributed. 
(n — 2)S? 
ae 
8. The statistic S? is independent of both Bo and fi. 


7. The random variable has a x? distribution with n — 2 df. 


Exercises 
a Derive the following identity: 
SSE = уо - 1) = To — Bo — Bii 
i=l i=l 
= Учу, = У)? – Ё. Y =F 85-5 
i=l i=l 
Notice that this provides an easier computational method of finding SSE. 


b Use the computational formula for SSE derived in part (a) to prove that SSE < Syy. 
[Hint: By = S,,/S...] 
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An experiment was conducted to observe the effect of an increase in temperature on the potency 
of an antibiotic. Three 1-ounce portions of the antibiotic were stored for equal lengths of time 
at each of the following Fahrenheit temperatures: 30°, 50^, 70°, and 90°. The potency readings 
Observed at the end of the experimental period were as shown in the following table. 


Potency Readings (y) | 38, 43,29 32,26,33 19, 27,23 14,19,21 


Temperature (х) | 30° 50° 70° 90° 


a Find the least-squares line appropriate for this data. 
Plot the points and graph the line as a check on your calculations. 
c Calculate 52. 


Calculate SSE and S° for Exercise 11.5. 


b Itis sometimes convenient, for computational purposes, to have x-values spaced symmet- 
rically and equally about zero. The x-values can be rescaled (or coded) in any convenient 
manner, with no loss of information in the statistical analysis. Refer to Exercise 11.5. Code 
the x-values (originally given on a scale of 1 to 8) by using the formula 


£e 


x — 4.5 
кае Я 
i9 
Then fit the model Y = fj + Brx* + =. Calculate SSE. (Notice that the x*-values are 
integers symmetrically spaced about zero.) Compare the SSE with the value obtained in 


part (a). 
Calculate SSE and S? for Exercise 11.8. 


b Refer to Exercise 11.8. Code the x-values in a convenient manner and fit a simple linear 
model to the LC50 measurements presented there. Compute SSE and compare your answer 
to the result of part (a). 


e 


A study was conducted to determine the effects of sleep deprivation on subjects’ ability to 
solve simple problems. The amount of sleep deprivation varied over 8, 12, 16, 20, and 24 hours 
without sleep. A total of ten subjects participated in the study, two at each sleep-deprivation 
level. After his or her specified sleep-deprivation period, each subject was administered a set 
of simple addition problems, and the number of errors was recorded. The results shown in the 
following table were obtained. 


Number of Errors (y) | 8,6 6,10 8,14 14,12 16,12 


Number of Hours without Sleep (x) | 8 12 16 20 24 


a Find the least-squares line appropriate to these data. 
b Plot the points and graph the least-squares line as a check on your calculations. 
c Calculate 52. 


Suppose that Y;, Y2, ..., Y, are independent normal random variables with E(Y;) = f + Bix; 
and V(Y;) = o?, fori = 1,2,...,n. Show that the maximum-likelihood estimators (MLEs) 
of Во and £; are the same as the least-squares estimators of Section 11.3. 


Under the assumptions of Exercise 11.20, find Cov( Bo. P1). Use this answer to show that ĝo 
and fl, are independent if Y^ , x; = 0. [Hint: Cov(Bo, B1) = Cov(Y — fix, B1). Use Theorem 
5.12 and the results of this section.] 


Under the assumptions of Exercise 11.20, find the MLE of o?. 
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Suppose that an engineer has fit the model 
Y = Во + Вх + =, 


where Y is the strength of concrete after 28 days and x is the water/cement ratio 
used in the concrete. If, in reality, the strength of concrete does not change with the 
water/cement ratio, then 6; = 0. Thus the engineer may wish to test Ho: Ву = 0 
versus H,: В # Oinorderto assess whether the independent variable has an influence 
on the dependent variable. Or the engineer may wish to estimate the mean rate of 
change P, in E(Y) for a l-unit change in the water/cement ratio x. 

In general, for any linear regression model, if the random error є is normally 
distributed, we have established that Bi is an unbiased, normally distributed estimator 
of B; with 


Lx 


nS 


V (Bo) = сос”, where cog = 
and 


V(B1) = euc?, where с = 


XX 
That is, the variances of both estimators are constant multiples of 6?, the variance 


of the error term in the model. Using this information, we can construct a test of the 
hypothesis Ho: В; = Pio (Bio is a specified value of ;), using the test statistic 


_ Bi — Bio 
m" Cii : 
where 
T x? d _ 1 
= nS. Som а Sex 


The rejection region for a two-tailed test is given by 
iz | = 20/2. 


As іп the case of the simple Z tests studied in Chapter 10, to compute either of ће 
preceding Z statistics, we must either know o or possess a good estimate based on an 
adequate number of degrees of freedom. (What would be adequate is a debatable point. 
We suggest that the estimate be based on 30 or more degrees of freedom.) When this 
estimate is unavailable (which usually is the case), an estimate of o may be calculated 
from the experimental data (in accordance with the procedure of Section 11.4) and 
substituted for o in the Z statistic. If we estimate с with $ = /SSE/(n — 2), the 
resulting quantity 


^ 


Bi — Bio 
СТ 


can be shown to possess a Student's ¢ distribution with n — 2 df (see Exercise 11.27). 


T= 
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Test of Hypothesis for 6; 
Ho : Bi — fio. 
Bi > Bio (upper-tail rejection region), 
lel, 8 fo <= [okay (lower-tail rejection region), 
[oh == fora) (two-tailed rejection region). 
Test statistic: T = Hem Pio 
БА СТ 
b> te (upper-tail alternative), 
Rejection region: 4 t < —f, (lower-tail alternative), 
[| = typ (two-tailed alternative), 
where 
2 
Ove 
соо = 2a and cy = 
TUS xx 


Notice that г, is based on (n — 2) df. 


EXAMPLE 11.4 


Solution 


Do the data of Example 11.1 present sufficient evidence to indicate that the slope 
differs from 0? Test using о = .05 and give bounds for the attained significance level. 


The preceding question assumes that the probabilistic model is a realistic description 
of the true response and implies a test of hypothesis Ho: 81 = 0 versus Ha: В 4 0 
in the linear model Y = fo + 61x + €. For these data, we determined in Example 11.1 
that fi = .7 and Syy = 10. Example 11.3 yielded 52 = SSE/(n — 2) = .367 and 
s = у .367 = .606. (Note: SSE is based on n — 2 = 3 df.) 

Because we are interested in the parameter В|, we need the value 


1 1 
С = — = — l.l. 
Пу 10 
Then, 
8, — Je 
==. ee 
S./C11 .6064/.1 


If we take о = .05, the value of t,/5 = 1025 for 3 df is 3.182, and the rejection 
region is 


reject if |t| > 3.182. 


Because the absolute value of the calculated value of t is larger than 3.182, we reject 
the null hypothesis that 6; = 0 at the a = .05 level of significance. Because the 
test is two-tailed, p-value = 2P(t > 3.65), where t has a t distribution with 3 df. 
Using Table 5, Appendix 3, we find that .01 < P(t > 3.65) < .025. Thus, we 
conclude that .02 < p-value < .05. Hence, we would reject the null hypothesis 
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for any value of a > .05. For values of a < .02, we would fail to reject the null 
hypothesis. If we had chosen .02 < o « .05, more specific information about the 
p-value is required. The applet Student's t Probabilities and Quantiles yields that, 
with 3 df, p-value = 2P(t > 3.65) = 2(.01775) = .0355. Again, we notice the 
agreement between the conclusions reached by the formal (fixed o) test procedure 
and the proper interpretation of the attained significance level. 

As a further step in the analysis, we could look at the width of a confidence interval 
for B, to see whether it is short enough to detect a departure from zero that would 
be of practical significance. We will show that the confidence interval for f, is quite 
wide, suggesting that the experimenter needs to collect more data before reaching a 
decision. О 


Based on the ¢ statistic given earlier, we can follow the procedures of Chapter 10 to 
show that a confidence interval for 6;, with confidence coefficient 1 — o, is as follows. 


A 100(1 — о)% Confidence Interval for 3; 
Bi + tup S cii, 


where 


and с = 


EXAMPLE 11.5 


Solution 


Calculate a 95% confidence interval for the parameter | of Example 11.4. 


The tabulated value for t 025, based on 3 df, is 3.182. Then the 95% confidence interval 
for B, is 


B1 Е tooss 4/1. 


Substituting, we get 


Л + (3.182)(.606)V0.1, or .7+.610. 


If we wish to estimate Ву correct to within .15 unit, it is obvious that the confidence 
interval is too wide and that the sample size must be increased. A 


11.23 


Exercises 


Refer to Exercise 11.3. 
а Do the data present sufficient evidence to indicate that the slope В; differs from zero? (Test 
at the 5% significance level.) 


b What can be said about the attained significance level associated with the test implemented 
in part (a) using a table in the appendix? 
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c Applet Exercise What can be said about the attained significance level associated with 
the test implemented in part (a) using the appropriate applet? 


d Find a 95% confidence interval for |. 


Refer to Exercise 11.13. Do the data present sufficient evidence to indicate that the size x of 
the anchovy catch contributes information for the prediction of the price y of the fish meal? 


a Give bounds on the attained significance level. 
b Applet Exercise What is the exact p-value? 


Based on your answers to parts (a) and/or (b), what would you conclude at the о = .10 
level of significance? 


Do the data in Exercise 11.19 present sufficient evidence to indicate that the number of errors 
is linearly related to the number of hours without sleep? 


a Give bounds on the attained significance level. 

b Applet Exercise Determine the exact p-value. 
Based on your answers to parts (a) and/or (b), what would you conclude at the a = .05 
level of significance? 

d Would you expect the relationship between y and x to be linear if x were varied over a 
wider range, say, from x = 4 to x = 48? 

e Give a 9596 confidence interval for the slope. Provide a practical interpretation for this 
interval estimate. 


Most sophomore physics students are required to conduct an experiment verifying Hooke's 
law. Hooke's law states that when a force is applied to a body that is long in comparison to its 
cross-sectional area, the change y in its length is proportional to the force x; that is, 


у = fix, 
where f, is a constant of proportionality. The results of a physics student's laboratory 


experiment are shown in the following table. Six lengths of steel wire, .34 millimeter (mm) in 
diameter and 2 meters (m) long, were used to obtain the six force-length change measurements. 


Force Change in Length 


x (kg) (y) (mm) 
29.4 4.25 
39.2 5.25 
49.0 6.50 
58.8 7.85 
68.6 8.75 
78.4 10.00 


а Fit the model, Y = Во + Вх + e, to the data, using the method of least squares. 
b Find a 95% confidence interval for the slope of the line. 


According to Hooke's law, the line should pass through the point (0, 0); that is, By should 
equal 0. Test the hypothesis that E(Y) = 0 when x = 0. Give bounds for the attained 
significance level. 


Applet Exercise What is the exact p-value? 
e What would you conclude at the œ = .05 level? 


Use the properties of the least-squares estimators given in Section 11.4 to complete the fol- 
lowing. 
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a Show that under the null hypothesis Ho: В; = Bio 
T- Bi — Bio 
S/Cii 


possesses a t distribution with n — 2 df, where i = 1, 2. 


b Derive the confidence intervals for f; given in this section. 


Suppose that Y;, Y2,..., Y, are independent, normally distributed random variables with 
Е(Ү;) = Bo + Bix; and V(Y;) = o?, for i = 1,2,...,n. Show that the likelihood ratio 
test of Ho: В, = 0 versus H,: В, # 0 is equivalent to the r test given in this section. 


Let Yi, Yo, ..., Y, be as given in Exercise 11.28. Suppose that we have an additional set 
of independent random variables Wi, W2,..., Wm, where W; is normally distributed with 
E(W;) = yo + ус; and V(W;) = o?, fori = 1, 2,..., m. Construct a test of Ho: fj = yi 
against the H,: f, 4 у. 


The octane number Y of refined petroleum is related to the temperature x of the refining process, 
but it is also related to the particle size of the catalyst. An experiment with a small-particle 
catalyst gave a fitted least-squares line of 


$ = 9.360 + .155x, 


with n — 31, V (Bi) — (.0202)?, and SSE — 2.04. An independent experiment with a large- 
particle catalyst gave 


ў = 4.265 + .190x, 
with n = 11, V (ĝi) = (.0193)2, and SSE = 1.86.7 


a Test the hypotheses that the slopes are significantly different from zero, with each test at 
the significance level of .05. 
*b Test at the .05 significance level that the two types of catalyst produce the same slope in 
the relationship between octane number and temperature. (Use the test that you developed 
in Exercise 11.29.) 


Using a chemical procedure called differential pulse polarography, a chemist measured the 
peak current generated (in microamperes, и А) when solutions containing different amounts of 
nickel (measured in parts per billion, ppb) are added to different portions of the same buffer.® 
Is there sufficient evidence to indicate that peak current increases as nickel concentrations 
increase? Use о = .05. 


х = Ni (ppb) у = Peak Current (uA) 


19.1 .095 
382 174 
57.3 .256 
76.2 .348 
95 429 
114 .500 
131 .580 
150 .651 
170 „122. 


6. Exercises preceded by an asterisk are optional. 
7. Source: Gweyson and Cheasley, Petroleum Refiner (August 1959): 135. 
8. Source: Daniel C. Harris, Quantitative Chemical Analysis, 3rd ed. (New York, Freeman, 1991). 
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11.32 


11.33 


11.34 


11.6 


Refer to Exercises 11.5 and 11.17. 


a Is there sufficient evidence to indicate that the median sales price for new single-family 
houses increased over the period from 1972 through 1979 at the .01 level of significance? 


b Estimate the expected yearly increase in median sale price by constructing a 99% confidence 
interval. 


Refer to Exercise 11.8 and 11.18. Is there evidence of a linear relationship between flow-through 
and static LC50s? Test at the .05 significance level. 


Refer to Exercise 11.33. Is there evidence of a linear relationship between flow-through and 
static LC50s? 


a Give bounds for the attained significance level. 


b Applet Exercise What is the exact p-value? 


Inferences Concerning Linear 
Functions of the Model Parameters: 
Simple Linear Regression 


In addition to making inferences about a single 8;, we frequently are interested in 
making inferences about linear functions of the model parameters Во and В. For 
example, we might wish to estimate E(Y), given by 


E(Y) = Bo + fix, 


where E(Y) represents the mean yield of a chemical process for the settings of 
controlled process variable x or the mean mileage rating of four-cylinder gasoline 
engines with cylinder volume x. Properties of estimators of such linear functions are 
established in this section. 

Suppose that we wish to make an inference about the linear function 


0 = aofo + aif), 


where ао and a, are constants (one of which may equal zero). Then, the same linear 
function of the parameter estimators, 


6 = aoĝo + aii, 
is an unbiased estimator of 0 because, by Theorem 5.12, 
E(8) = aE (Bo) + a1 E(B1) = aobo + aii = 6. 
Applying the same theorem, we determine that the variance of 6 is 
V (Ê) = ау (Bo) + атУ (д1) + 2aoaiCov(Bo, Ёл), 
where V (B;) = сино? and Cov(Bo, By) = coo”, with 


dd. 1 


, си = , 
nS, Su 


| 
=] 


Соо co = 


хл 
E 
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Some routine algebraic manipulations yield 


2 
: og + a — 2aga\x 
V(6) = п 7 о?. 


Finally, recalling that до апа f, are normally distributed in repeated sampling 
(Section 11.4), it is clear that Ó is a linear function of normally distributed random 
variables, implying that Ó is normally distributed. 

Thus, we conclude that 


Z= 


% 
has a standard normal distribution and could be employed to test the hypothesis 
Ho: @= Oo 


when 6 is some specified value of Ө = aofo + a) 81. Likewise, a 100(1 — a)% 
confidence interval for 0 = aofo + aif is 


6 + тузо. 


We notice that, in both the Z statistic and the confidence interval above, од = 
y V (0) is a constant (depending on the sample size n, the values of the x's, and the 
values of the a’s) multiple of ø. If we substitute S for ø in the expression for Z, the 
resulting expression (which we identify as T) possesses a Student's f distribution in 
repeated sampling, with n — 2 df, and provides a test statistic to test hypotheses about 
Ө = aofo + aif. 

Appropriate tests are summarized as follows. 


A Test for Ө = apo + a1 61 


Ho eem 09, 
0 > 0o, 
О Ci 
0 + @ 
6-4 
Test statistic: T = = у 
225 
аб + ay — 2agaix 
S n 
Sus 
N 
[f S bein 
Rejection region: 4 f < —f,, 
|t| > Ío /2 


Here, г, and t,/? are based on n — 2 df. 
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The corresponding 100(1 — w)% confidence interval for 0 = aofo + 9101 is as 


follows. 


A 100(1 — a)% Confidence Interval for Ө = ao) +418, 


7) 

3 

2 =. 
== Lp a? — 2aoaix 


0 tans 
G Om 


where the tabulated 1,/; is based on n — 2 df. 


One useful application of the hypothesis-testing and confidence interval techniques 
just presented is to the problem of estimating E (Y), the mean value of У, for a fixed 
value of the independent variable x. In particular, if x* denotes a specific value of x 


that is of interest, then 
E(Y) = Bo + Pix". 


Notice that Е(Ү) is a special case of ap бо + 4161, with ag = 1 and a; = x*. Thus, an 
inference about E(Y) when x = x* can be made by using the techniques developed 


earlier for general linear combinations of the ’s. 


In the context of estimating the mean value for Y, E(Y) = Bo + Bix* when the 
independent variable x takes on the value x*, it can be shown (see Exercise 11.35) 


that, with ay = 1, а = x*, 


P» 


2 — 
af =— +a? — 2ада\х o =2 
e 1 061 LG ope 


Sa n Su 


A confidence interval for the mean value of Y when x — x*, a particular value of x, 


is as follows. 


A 100(1 — о)% Confidence Interval for E(Y) = 85 + 84 x* 


б - xy 


S XX 


, 


A x 1 
Co ei = lops Е + 


where the tabulated г, /2 is based on n — 2 df. 


This formula makes it easy to see that for a fixed value of n and for given x-values, 
the shortest confidence interval for E (Y) is obtained when x* = x, the average of the 
x-values used in the experiment. If our objective is to plan an experiment that yields 
short confidence intervals for E(Y) when x = x*, n should be large, $,, should be 
large (if possible), and x should be near x*. The physical interpretation of a large Sy, 
is that when possible the values of x used in the experiment should be spread out as 


much as possible. 
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EXAMPLE 11.6 


Solution 


For the data of Example 11.1, find a 9096 confidence interval for E(Y) when x = 1. 


For the model of Example 11.1, 
E(Y) = Bo + Bix. 
To estimate Е(Ү) for any fixed value x = x*, we use the unbiased estimator E(Y) = 
Bo + Bi1x*. Then, 
Bo + Bix* =14+.7x*. 
For this case, x* = 1; and because n = 5, x = 0, and $,, = 10, it follows that 
1 x* — x)? 1 1—0)? 
pa-m 1 ü-o- 
n Sy 5 10 


In Example 11.3, we found 52 to be .367, or s = .606, for these data. The value of 
tos with n — 2 = 3 df is 2.353. 
The confidence interval for E(Y) when x — 1 is 


idi 


(xt х)? 


2 Я 1 
Во + Bix* + 128,1 — + 
п Sx 


[(1 + (.7)(1)] + (2.353)(.606) V3 
1.7 + .781. 


That is, we are 9096 confident that, when the independent variable takes on the 
value x — 1, the mean value E(Y) of the dependent variable is between .919 and 
2.481. This interval obviously is very wide, but remember that it is based on only five 
data points and was used solely for purposes of illustration. We will show you some 
practical applications of regression analyses in Section 11.9. B 


11.35 


11.36 


11.37 


Exercises 


For the simple linear regression model Y = Во + fix + = with E(e) = 0 and V (e) = o?, use 
the expression for V (або + а, Ё.) derived in this section to show that 
I (x*—xy 


V(Bo + Bix*) = [+= 1 


For what value of x* does the confidence interval for E(Y) achieve its minimum length? 


Refer to Exercise 11.13 and 11.24. Find the 90% confidence interval for the mean price per 
ton of fish meal if the anchovy catch is 5 million metric tons. 


Using the model fit to the data of Exercise 11.8, construct a 95% confidence interval for the 
mean value of flow-through LC50 measurements for a toxicant that has a static LC50 of 12 
parts per million. (Also see Exercise 11.18.) 


11.38 


11.39 
11.40 


*11.41 


11.7 
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Refer to Exercise 11.3. Find a 9096 confidence interval for E(Y) when x* — 0. Then find 9096 
confidence intervals for E(Y) when x* — —2 and x* — 4-2. Compare the lengths of these 
intervals. Plot these confidence limits on the graph you constructed for Exercise 11.3. 


Refer to Exercise 11.16. Find a 9596 confidence interval for the mean potency of a 1-ounce 
portion of antibiotic stored at 65^F. 


Refer to Exercise 11.14. Find a 9096 confidence interval for the expected proportion of survivors 
at time period .30. 


Refer to Exercise 11.4. Suppose that the sample given there came from a large but finite 
population of inventory items. We wish to estimate the population mean of the audited values, 
using the fact that book values are known for every item on inventory. If the population contains 
N items and 


E(Yi) = ш = Bo + fixi, 


then the population mean is given by 


D jx 
Шү = x2. = Во + Bi (x) Ух = Bot Bip. 
i=l i=l 


a Using the least-squares estimators of Во and @|, show that иу can be estimated by 
йу =F + fius — X). 
(Notice that y is adjusted up or down, depending on whether x is larger or smaller than 
их.) 
b Using the data of Exercise 11.4 and the fact that и, = 74.0, estimate шу, the mean of ће 


audited values, and place a 2-standard-deviation bound on the error of estimation. (Regard 
the x;-values as constants when computing the variance of Йу.) 


Predicting a Particular Value of Y by 
Using Simple Linear Regression 


Suppose that for a fixed pressure the yield Y for a chemical experiment is a function 
of the temperature x at which the experiment is run. Assume that a linear model of 
the form 


Y = Boc ix += 


adequately represents the response function traced by Y over the experimental region 
of interest. In Section 11.6, we discussed methods for estimating E(Y) for a given 
temperature, say, x*. That is, we know how to estimate the mean yield E(Y) of the 
process at the setting x — x*. 

Now consider a different problem. Instead of estimating the mean yield at x*, we 
wish to predict the particular response Y that we will observe if the experiment is run 
at some time in the future (such as next Monday). This situation would occur if, for 
some reason, the response next Monday held a special significance to us. Prediction 
problems frequently occur in business where we may be interested in next month's 
profit on a specific investment rather than the average gain per investment in a large 
portfolio of similar stocks. 
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Notice that Y is a random variable, not a parameter; predicting its value therefore 
represents a departure from our previous objective of making inferences about pop- 
ulation parameters. If it is reasonable to assume that є is normally distributed with 
mean 0 and variance o^, it follows that Y is normally distributed with mean Во + Вх 
and variance o^. If the distribution of a random variable Y is known and a single 
value of Y is then selected, how would you predict the observed value? We contend 
that you would select a value of Y near the center of the distribution—in particular, 
a value near the expected value of Y. If we are interested in the value of Y when 
х = x*, call it Y*, we could employ y: = Bo + B1x* as a predictor of a particular 
value of Y* and as an estimator of E(Y) as well. " 

If x = x*, the error of predicting a particular value of Y*, using Y* as the predictor, 
is the difference between the actual value of Y * and the predicted value: 

error — Y* — Y*. 
Let us now investigate the properties of this error in repeated sampling. 

Because both Y* and Y* are normally distributed random variables, their difference 
(the error) is also normally distributed. 


Applying Theorem 5.12, which gives the formulas for the expected value and 
variance of a linear function of random variables, we obtain 


E(error) = E(Y* — Y*) = E(Y*) — E(Y*), 
and because E(Y*) = fo + B1x* = E(Y*), 
Е (error) = 0. 
Likewise, 
V (error) = V(Y* — Y*) = V(Y + V(Y*) — 2Cov(Y*, Y»). 
Because we are predicting a future value Y* thatis not employed in the computation 


of Y*, it follows that Y* апа Y* are independent and hence that Cov(Y*, Y*) = 0. 
Then, 


V (error) = V(Y*) + V(Y*) = o? + V (Bo + Bix") 


mm G à EIE) ji 
n S 


1 *o Fig 
sofist E], 
n S 


We have shown that the error of predicting a particular value of Y is normally 
distributed with mean 0 and variance as given in the preceding equation. It follows that 


ү* ү 
Z= 
1 ж 2, 
ZEE z) 
n Sa 


has a standard normal distribution. Furthermore, if S is substituted for o, it can be 
shown that 
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possesses a Student's t distribution with n — 2 df. We use this result to place a bound 
on the error of prediction; in doing so, we construct a prediction interval for the 
random variable Y*. The procedure employed is similar to that used to construct the 
confidence intervals presented in the preceding chapters. 

We begin by observing that 


P(—tjj5 < T < 1р) = 1— a. 


Substituting for T, we obtain 


yay? 


1 (x*—xy? 
без + ——— 
n Sx 


In other words, in repeated sampling the inequality within the brackets will hold with 
a probability equal to (1 — o). Furthermore, the inequality will continue to hold with 
the same probability if each term is multiplied by the same positive factor or if the 
same quantity is added to each term of the inequality. Multiply each term by 


1 (*-x? 
S, [14 + -———— 
n Su 


Р | —taj2 < < tap | = 1-0. 


and then add y* to each to obtain 


==. 1 (х*—х)? „ 
Pa tal А uy 
n Sex 


ee 1 œ- 
SY teas 1+ — qo =l-a. 


Thus, we have placed an interval about Y* that in repeated sampling will contain the 
actual value of Y* with probability 1 — о. That is, we have obtained a 100(1 — a)% 
prediction interval for Y*. 


A 100(1 — о) % Prediction Interval for Y when x = x* 


ү. L. G7 
Во+ Bix” Е 14/25 lE m 


In attempting to place a bound on the error of predicting Y, we would expect the 
error to be less in absolute value than 


- ja ghee 
“+ п Ses 


with probability equal to (1 — o). 
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Notice that the length of a confidence interval for E(Y) when x — x* is given by 


PES 1 (x* — x)? 
x a a E F , 
P n Su 
whereas the length of a prediction interval for an actual value of Y when x — x* is 
given by 


Sy x 


Thus, we observe that prediction intervals for the actual value of Y are longer than 
confidence intervals for E(Y) if both are determined for the same value of x*. 


1 (x*—xy* 
2 x tgp S 1 + — Б. 
n 


EXAMPLE 11.7 


Solution 


Suppose that the experiment that generated the data of Example 11.1 is to be run 
again with x = 2. Predict the particular value of Y with 1 — a = .90. 


From Example 11.1, we have 
Bo=1 and f,-2.7, 
so the predicted value of Y with x = 2 is 
Bo + Bix* = 1 + (7)0) = 2.4. 
Further, with x* = 2, 
1 (Qq*-xy» 1 (2-0) 
n | cm 5+. Tm zum 


From Example 11.3, we know that s = .606. The tos value with 3 df is 2.353. Thus, 
the prediction interval is 


^ жоо 1 (х*—х)? 
Bo + Bix” + tuyos DRE 


Su 
2.4 + (2.353)(.606)/ 1 + .6 
2.4 + 1.804. О 


Figure 11.7 represents some hypothetical data апа the estimated regression line 
fitted to those data that indicates the estimated value of E(Y) when x = 8. Also shown 
on this graph are confidence bands for E(Y). For each value of x, we computed 


2 x 1 (х—х)? 
Bo + Bix E fu S | — + ————. 
n Six 


Thus, for each value of x we obtain a confidence interval for E(Y). The confidence 
interval for E(Y) when x — 7 is displayed on the y-axis in the figure. Notice that the 
distance between the confidence bands is smallest when x — x, as expected. Using 


FIGURE 11.7 
Some hypothetical 
data and associated 
confidence and 
prediction bands 


11.42 


11.43 


11.44 


11.45 


11.46 


11.47 


Exercises 597 


y 
10 e 

9 

8 

7 e e y 

UEM 

. ө : 9595 Confidence 15 9596 pr ano bands for Y 

4 interval for E(Y) 14 

3 ө when x = 7 95% 

: : е Confidence 

bands for E(Y) 


12345678910 = 


ini 


Y when x = 8 


10 

s 

е 

i ^ estimated E(Y ) 

5 when x = 8 78910 x 
4 rc 

3 actual observed value of х= 4.75 

2 

1 


12345678910 * 


the same approach, we computed prediction bands for the prediction of an actual 
Y-value for each setting of x. As discussed earlier, for each fixed value of x, the 
prediction interval is wider than the corresponding confidence interval. The result is 
that the prediction bands fall uniformly farther from the prediction line than do the 
confidence bands. The prediction bands are also closest together when x = x. 


Exercises 


Suppose that the model У = Во + Bix + = is fit to the n data points (yi, x1), ..., (Yn, Xn). At 
what value of x will the length of the prediction interval for Y be minimized? 


Refer to Exercises 11.5 and 11.17. Use the data and model given there to construct a 95% 
prediction interval for the median sale price in 1980. 


Refer to Exercise 11.43. Find a 9596 prediction interval for the median sale price for the year 
1981. Repeat for 1982. Would you feel comfortable in using this model and the data of Exercise 
11.5 to predict the median sale price for the year 1988? 


Refer to Exercises 11.8 and 11.18. Find a 95% prediction interval for a flow-through LC50 if 
the static LC50 is observed to be 12 parts per million. Compare the length of this interval to 
that of the interval found in Exercise 11.37. 


Refer to Exercise 11.16. Find a 95% prediction interval for the potency of a 1-ounce portion 
of antibiotic stored at 65°Е. Compare this interval to that calculated in Exercise 11.39. 


Refer to Exercise 11.14. Find a 9596 prediction interval for the proportion of survivors at time 
x — .60. 
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Correlation 


The previous sections of this chapter dealt with modeling a response Y as a linear 
function of a nonrandom variable x so that appropriate inferences could be made 
concerning the expected value of Y, or a future value of Y, for a given value of x. 
These models are useful in two quite different practical situations. 

First, the variable x may be completely controlled by the experimenter. This occurs, 
for example, if x is the temperature setting and Y is the yield in a chemical experiment. 
Then, x is merely the point at which the temperature dial is set when the experiment 
is run. Of course, x could vary from experiment to experiment, but it is under the 
complete control, practically speaking, of the experimenter. The linear model 


Y = В+ ix += 


then implies that 
E(Y) = Bo + fix 


or that the average yield is a linear function of the temperature setting. 

Second, the variable x may be an observed value of a random variable X. For 
example, we may want to relate the volume of usable timber Y in a tree to the 
circumference X of the base. If a functional relationship could be established, then 
in the future we could predict the amount of timber in any tree simply by measuring 
the circumference of the base. For this situation, we use the model 


Y = В + fix +€ 
to imply that 
E(Y|X = х) = Во + fix. 


That is, we are assuming that the conditional expectation of Y for a fixed value of X is 
a linear function of the x-value. We generally assume that the vector random variable 
(X, Y) has a bivariate normal distribution with E(X) = ux, E(Y) = uy, V(X) = 
se. Ү(У) = оў, and correlation coefficient о (see Section 5.10), in which case it can 
be shown that 


E(Y|X =x) = + Bix, — where В = ~~ p. 
X 


The statistical theory for making inferences about the parameters Во and f is 
exactly the same for both of these cases, but the differences in model interpretation 
should be kept in mind. 

For the case where (X, Y) has a bivariate distribution, the experimenter may not 
always be interested in the linear relationship defining E(Y|X). He or she may want 
to know only whether the random variables X and Y are independent. If (X, Y) has 
a bivariate normal distribution (see Section 5.10), then testing for independence is 
equivalent to testing whether the correlation coefficient р is equal to zero. Recall from 
Section 5.7 that p is positive if X and Y tend to increase together and p is negative if 
Y decreases as X increases. 

Let (X1, Yi), (X2, Y), ..., (Xn, Yn) denote a random sample from a bivariate 
normal distribution. The maximum-likelihood estimator of р is given by the sample 
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correlation coefficient: 
‚_ Mags - 3X - Y) 
VOLO X? Yn -Y? 


Notice that we can express r in terms of familiar quantities: 


TM 
Six S yy Syy | 


It follows that г and f, have the same sign. 
In the case where (X, Y) has a bivariate normal distribution, we have indicated 
that 


E(Y|X = x) = Bot Bix, where f, = T p. 
х 


Thus, for example, testing Ho: о = 0 versus H,: р > 0 is equivalent to testing 
Ho: B1 = 0 versus Hy: В > 0. Similarly, Ha: р < 0 is equivalent to Hz: Ву < 0, 
and Ha: р # 0 is equivalent to H,: В 4 0. Tests for each of these sets of hypotheses 
involving В; can be based (see Section 11.5) on the statistic 


iis Ё. –0 
S/A S 


which possesses a f distribution with n — 2 df. In fact (see Exercise 11.55), this statistic 
can be rewritten in terms of r as follows: 

гуп = 2 

Wiper 
Because the preceding two t statistics are algebraic equivalents, both possess the same 
distribution: the t distribution with n — 2 df. 

It would seem natural to use r as a test statistic to test more general hypotheses 
about р, but the probability distribution for r is difficult to obtain. The difficulty can 
be overcome, for moderately large samples, by using the fact that (1/2) In[(1 + r)/ 
(1 — r)] is approximately normally distributed with mean (1/2) In[(1 + o)/(1 — o)] 
and variance 1/ (n — 3). Thus, for testing the hypothesis Ho: о = oo, we can employ 


a Z test in which 
1 l+r 1 1+ po 
In In 
z= 2 l-r 2 1 — ро 


1 í 
мп = 3 
If o is the desired probability of а type I error, ће form of the rejection region 


depends on the alternative hypothesis. The various alternatives of most frequent 
interest and the corresponding rejection regions are as follows: 


На: p > po, КК: > Za, 
Ag: р < po, RR: z < —ze, 
Ha: p F po, RR: |z| > Za/2- 


We illustrate with an example. 
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EXAMPLE 11.8 


Solution 


The data in Table 11.3 represent a sample of mathematics achievement test scores and 
calculus grades for ten independently selected college freshmen. From this evidence, 
would you say that the achievement test scores and calculus grades are independent? 
Use a = .05. Identify the corresponding attained significance level. 


We state as the null hypothesis that X and Y are independent; or, assuming that (X, Y) 
has a bivariate normal distribution, we test Ho: o = 0 versus H,: р # 0. Because we 
are focusing on p = 0, the test can be based on the statistic t = (гул — 2)/4/1 — r?. 
Denoting achievement test scores by x and calculus grades by y, we calculate 


lx -460, У? = 23,634, — Sp = 2,474, 
уун =760, у у; = 59,816, — 5у,=2,056, 
У`ху = 36,854, Sey = 1,894. 
Thus, 
б» 1894 


= = = .8398 
М SxxSyy / (2474) (2056) 
The value of the test statistic is 


—2 
гуп B (.8398)4/8 = 4375. 


i /1-r? JS1—.7053 
Because t is based on n — 2 = 8 df, Г, = to2s = 2.306; the observed value of 
our test statistic lies in the rejection region. Thus, the evidence strongly suggests that 
achievement test scores and calculus grades are dependent. Notice that a = .05 is 
the probability that our test statistic will fall in the rejection region when Ну is true. 
Hence, we are fairly confident that we have made a correct decision. 

Because we are implementing a two-tailed test, p-value = 2P(t > 4.375). 
From the values contained in Table 5, Appendix 3, it follows that P(t > 4.375) < 
.005. Thus, p-value « 2(.005) — .010, and for any value of o greater than .01 


Table 11.3 Data for Example 11.8 


Mathematics 
Achievement Final Calculus 
Student Test Score Grade 


© о со мз с t RU IM н 
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(including о = .05, as used in the initial part of this analysis), we would conclude 
that р 4 0. The applet Students t Probabilities and Quantiles, used with 8 df, yields 
that p-value = 2P(t > 4.375) = 2(.00118) = .00236, a value considerably smaller 
than the upper bound for the p-value that was obtained by using the Table 5. m 


Notice that the square of the correlation coefficient occurs in the denominator of 
the f statistic used to implement the test of hypotheses in Exercise 11.8. The statistic 
r? is called the coefficient of determination and has an interesting and useful inter- 
pretation. Originally (Section 11.3), we defined SSE as the sum of the squares of the 


differences between the observed and predicted values of the y;'s, 
SSE = Y 6i – $0 = У» — (Bo + Bix P. 
i=l i=l 


If the simple linear regression model fits the data well, the differences between the 
observed and predicted values are small, leading to a small value for SSE. Analo- 
gously, if the regression model fits poorly, SSE will be large. In Exercise 11.15, you 
showed that a computationally convenient equation for SSE is 


SSE = Syy — Й15уу, where f, = 5р, 
XX 
Using this expression it was easy to show (Exercise 11.15(b)) that SSE < Syy. The 
quantity Syy = YO; — y)? provides a measure of the total variation among the 
y-values, ignoring the x's. Alternatively, SSE measures the variation in the y-values 
that remains unexplained after using the x's to fit the simple linear regression model. 
Thus, the гапо SSE/S,, gives the proportion of the total variation in the y;'s that is 
unexplained by the linear regression model. 
Notice that the coefficient of determination may be written as 


2 ^ 
" Soy (3) (5) fis, Sy — SSE _ | SSE 
r= ————Á == = - = - = === = : 
mm CN REC Es Ss om 


Thus, r? can be interpreted as the proportion of the total variation in the y;'s that is 
explained by the variable x in a simple linear regression model. 


EXAMPLE 11.9 


Solution 


Referto Example 11.8 where we calculated the correlation coefficient between mathe- 
matics achievement test scores and final calculus grades for ten independently selected 
college freshmen. Interpret the values of the correlation coefficient and the coefficient 
of determination. 


In Example 11.8, we obtained r — .8398. Since r is positive, we conclude that 
freshmen with higher achievement test scores tend to earn higher calculus grades. 
The coefficient of determination is r? — (.8398)? — .7053. Thus, 70.5396 of the 
variation in the final calculus grades is explained by fitting the simple linear model 
using math achievement scores as the independent variable. The regression model 
works very well. BH 
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Exercises 


The accompanying table gives the peak power load for a power plant and the daily high 
temperature for a random sample of 10 days. Test the hypothesis that the population correlation 
coefficient p between peak power load and high temperature is zero versus the alternative that 
it is positive. Use о = .05. Bound or determine the attained significance level. 


Day High Temperature (CF) Peak Load 


1 95 214 
2 82 152 
3 90 156 
4 81 129 
3 99 254 
6 100 266 
7 93 210 
8 95 204 
9 93 213 
10 87 150 


Applet Exercise Refer to Example 11.1 and Exercise 11.2. Access the applet Fitting a Line 
Using Least Squares. The data that appear on the first graph is from Example 11.1. 


a Drag the blue line to obtain an equation that visually fits the data well. What do you notice 
about the values of SSE and r? as the fit of the line improves? Why does r? 
SSE decreases? 


b Click the button “Find Best Model” to obtain the least-squares line. What is the value of 
r?? What is the value of the correlation coefficient? 


increase as 


Applet Exercise Refer to Exercises 11.5 and 11.6. The data from Exercise 11.5 appear in the 
graph under the heading "Another Example" in the applet Fitting a Line Using Least Squares. 


a Dragthe blue line to obtain an equation that visually fits the data well. What do you notice 
about the value of r? as the fit of the line improves? 

b Click the button “Find Best Model” to obtain the least-squares line. What is the value of 
r?? What is the value of the correlation coefficient? 


c Why is the value of r? so much larger than the value of r° that you obtained in Exercise 
11.49(b) that used the data from Example 11.1? 


In Exercise 11.8 both the flow-through and static LC50 values could be considered random 
variables. Using the data of Exercise 11.8, test to see whether the correlation between static 
and flow-through values significantly differs from zero. Use о = .01. Bound or determine the 
associated p-value. 


Is the plant density of a species related to the altitude at which data are collected? Let Y denote 
the species density and X denote the altitude. A fit of a simple linear regression model using 
14 observations yielded $ — 21.6 — 7.79x and r? — .61. 


a Whatis the value of the correlation coefficient г? 


b What proportion of the variation in densities is explained by the linear model using altitude 
as the independent variable? 


с Is there sufficient evidence at the о = .05 to indicate that plant densities decrease with an 
increase in altitude? 
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Exercises 603 


The correlation coefficient for the heights and weights of ten offensive backfield football 
players was determined to be r = .8261. 


a What percentage of the variation in weights was explained by the heights of the players? 
b What percentage of the variation in heights was explained by the weights of the players? 


с Is there sufficient evidence at the œ = .01 level to claim that heights and weights are 
positively correlated? 


d Applet Exercise What is the attained significance level associated with the test performed 
in part (c)? 
Suppose that we seek an intuitive estimator for 
Cov(X, Y) 
pm —. 
OxOy 
a The method-of-moments estimator of Cov(X, Y) = E[(X — ux)(Y — uy)] is 


p imu n _ _ 
Cov(X, Y) = = 5 /Gt; ЮУ - Y). 


i=l 


Show that the method-of-moments estimators for the standard deviations of X and Y are 


1 zm mE a 
ôx = 2 c and бу = ЕА 


b Substitute the estimators for their respective parameters іп the definition of р and obtain 
the method-of-moments estimator for p. Compare your estimator to r, the maximum- 
likelihood estimator for o presented in this section. 


Consider the simple linear regression model based on normal theory. If we are interested in 
testing Ho: В; = 0 versus various alternatives, the statistic 


r- du 
S/ S. 


possesses a t distribution with n — 2 df if the null hypothesis is true. Show that the equation 
for T can also be written as 


туп = 2 
1-2 


Т = 


Refer to Exercise 11.55.15 r = .8 big enough to claim p > 0 at the æ = .05 significance level? 

a Assume п = 5 and implement the test. 

b Assume n — 12 and implement the test. 

c Applet Exercise Determine the p-values for the tests implemented in parts (a) and (b). 

d Did you reach the same conclusions in parts (a) and (b)? Why or why not? 

e Why is the p-value associated with the test in part (b) so much smaller that the p-value 
associated with the test performed in part (a)? 

Refer to Exercises 11.55 and 11.56. 


a Whatterm in the T statistic determines whether the value of t is positive or negative? 


b What quantities determine the size of |t|? 


Refer to Exercise 11.55. If n — 4, what is the smallest value of r that will allow you to conclude 
that o > 0 at the æ = .05 level of significance? 
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11.59 Refer to Exercises 11.55 and 11.58. If = 20, what is the largest value r that will allow you 
to conclude that о < 0 at the о = .05 level of significance? 


*11.60 Refer to Exercises 11.8 and 11.51. Suppose that independent tests, with the same toxicants and 
species but in a different laboratory, showed r — .85 with n — 20. Test the hypothesis that the 
two correlation coefficients between static and flow-through LC50 measurements are equal. 
Use a = .05. 


11.9 Some Practical Examples 


In this section, we present two examples illustrating the applicability of previously 
developed techniques to real data. Most of the methods are illustrated somewhere in 
the course of the discussions. We make no attempt to implement every method for 
each example. 


EXAMPLE 11.10 In his Ph.D. thesis, H. Behbahani examined the effect of varying the water/cement 
ratio on the strength of concrete that had been aged 28 days. For concrete with a 
cement content of 200 pounds per cubic yard, he obtained the data presented in 
Table 11.4? Let Y denote the strength and x denote the water/cement ratio. 


a Fit the model E(Y) = Во + Aix. 

b Test Ho: 6; = 0 versus H,: B1 < 0 with a = .05. (Notice that if Ho is rejected 
we conclude that 8; < 0 and that the strength tends to decrease with an increase 
in water/cement ratio.) Identify the corresponding attained significance level. 

с Find a 90% confidence interval for the expected strength of concrete when the 
water/cement ratio is 1.5. What will happen to the confidence interval if we try to 
estimate mean strengths for water/cement ratios of .3 or 2.7? 


Solution a Using the formulas developed in Section 11.3, we obtain 


n 1 n n 1 
Sey = xy--».x) yi = 8.709 — c (8:74) (6.148) = —.247, 
= hg i=l 
n 1 п 2 1 
ВЕ dd {| = 12.965 — -(8.74Y? = .234, 
уз, Qs) 114 


2 
n 1 n 1 
‘= Ł y- (> 7 = 6.569 — z (6.148)? = .269, 


А Sey —0.247 
Ё = = = = — 1.056, 
Sex 0.234 
Е x 6.148 8.74 


(Throughout this example, all calculations are carried out to three decimal places.) 


9. Source: Data adapted from Hamid Behbahani, “Econocrete—Design and Properties” (Ph.D. thesis, 
University of Florida, 1977), p. 95. 
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Table 11.4 Data for Example 11.10 


Water/Cement Ratio Strength (100 ft/Ib) 
1.21 1.302 
1.29 1.231 
1.37 1.061 
1.46 1.040 
1.62 .803 
1.79 SLM 


Thus, the straight-line model that best fits the data is 
$ = 2.563 — 1.056x. 


b Because we desire to test whether there is evidence that Ву < 0 witha = .05, the 
appropriate test statistic is 


pi —0 Bi —0 
t= ‚ ог t= . 
S CI S 1 

бут 


For this simple linear regression model, 


SSE = Sy, — Êi Sry = .269 — (—1.056)(—.247) = .008, 


E à 
= Уу = | us NE uid = .045. 
п – 2 4 


Thus, ће value of the appropriate test statistic for testing Ho: В = 0 versus 
Ha: Ву < 015 


апа, һепсе, 


21056-0 _ 
7 204547170234) _ 


Because this statistic is based on п — 2 = 4 df апа the appropriate rejection 
region is t < —fos = —2.132, we reject Ho in favor of H, at the a = .05 
level of significance. The appropriate test is a lower-tail test, and p-value = 
P(t < —11.355), where ¢ has a t distribution with 4 df. Table 5, Appendix 
3, applies to give p-value < .005. In fact, the applet Student's t Probabili- 
ties and Quantiles gives p-value = P(t « —11.355) = P(t > 11.355) = 
.00017, a value considerably smaller than .005. Hence, for most commonly used 
values of о, we conclude that there is evidence to indicate that strength de- 
creases with an increase in the water/cement ratio on the region where the ex- 
periment was conducted. From a practical point of view, the water/cement ratio 
must be large enough to moisten the cement, sand, and other components that 
make up concrete. But if the water/cement ratio gets too large, the concrete will 
be useless. 

c Because the model that we are using is a simple linear regression model, the 
confidence interval can be obtained from the formula 


ИТР Ü dx 
Bot Bix” = tja S | ++ —~—. 
n S 


11.355. 
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We want a confidence interval when x = 1.5; therefore, x* = 1.5 and 
Bo + Bix* = 2.563 — (1.056) (1.5) = .979. 


Using calculations from parts (a) and (b), we obtain the desired 90% confidence 
interval: 


(1.5 — 1.457)? 

‚234 | 
Thus, we would estimate that the mean strength of concrete with a water/cement 
ratio of 1.5 to be between .938 and 1.020. 

We can see from the variance expression that the confidence interval gets wider 
as x* gets farther from x = 1.457. Also, the values x* = .3 and x* = 2.7 are 
far from the values that were used in the experiment. Considerable caution should 
be used before constructing a confidence interval for E(Y) when the values of x* 
are far removed from the experimental region. Water/cement ratios of .3 and 2.7 
would probably yield concrete that is utterly useless! L1 


(.938, 1.020). 


.979 + EE + 


In many real-world situations, the most appropriate deterministic component of 
a model is not linear. For example, many populations of plants or animals tend to 
grow at exponential rates. If Y, denotes the size of the population at time t, we might 
employ the model 


Е(Ү,) = age". 


Although this expression is not linear in the parameters a and о, it can be linearized 
by taking natural logarithms. If Y, can be observed for various values of t, we can 
write the model as 


ln Y, = Ingo + ait +e 


and estimate In o9 and o by the method of least squares. 

Other basic models can also be linearized. In the biological sciences, itis sometimes 
possible to relate the weight (or volume) of an organism to some linear measurement 
such as length (or weight). If W denotes weight and / length, the model 


E(W) = aol™ 


for unknown ap and o, is often applicable. (This model is known as an allometric 
equation.) If we want to relate the weight of randomly selected organisms to observ- 
able fixed lengths, we can take logarithms and obtain the linear model 


In W = Ino +a, In] +£ = fo + fix +e 


with x = In/. Then, Во = Ino and Ву = o, can be estimated by the method of least 
squares. The following example illustrates such a model. 


EXAMPLE 11.11 


In the data set of Table 11.5, W denotes the weight (in pounds) and / the length (in 
inches) for 15 alligators captured in central Florida. Because / is easier to observe 
(perhaps from a photograph) than W for alligators in their natural habitat, we want to 


Solution 
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Table 11.5 Data for Example 11.11 


Alligator x—ln/ y=InW 


1 3.87 4.87 
2 3.61 3.93 
3 4.33 6.46 
4 3.43 3.33 
5 3.81 4.38 
6 3.83 4.70 
7 3.46 3.50 
8 3.76 4.50 
9 3.50 3.58 
10 3.58 3.64 
11 4.19 5.90 
12 3.78 4.43 
13 3.71 4.38 
14 3.73 4.42 
15 3.78 4.25 


construct a model relating weight to length. Such a model can then be used to predict 
the weights of alligators of specified lengths. Fit the model 

ln W = пао + о Inl + £ = fo + bix + € 
to the data. Find a 90% prediction interval for W if In / is observed to be 4.00. 


We begin by calculating the quantities that have routine application throughout our 


solution: 
n 


i 1 i 1 
Sry = xy- т у уху yi = 251.9757 – 15 663(66.27) = 2.933, 
ial 


і=1 і=1 


п 1 п 1 
— 2 = =e eet 2 
Sax = 2 xi =. (> = 212.6933 1; (56.37) — 0.8548, 


n 1 n T 1 
$,2yX-- | ») = 303.0409 — i; (66.27? = 10.26, 


x Sa 2.933 
ĝi = = = = 3.4312, 
S, 0.8548 


ĝo = y — Вх = 562 — (3.4312) (==) = —8.476. 
We can now estimate ag by 
dy = ef? = e847 — 10002 
and o; by ё = B 1 to arrive at the estimated model 
Ñ = @Ы# = (.0002)19®9!7. 


(In many cases, o, will be close to 3 because weight or volume is often roughly 
proportional to the cube of a linear measurement.) 
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For these data, SSE = .1963, п = 15, and s = ./SSE/(n — 2) = .123. The calcu- 
lations leading to these numerical values are completely analogous to the calculations 
of Example 11.10. 

To find a prediction interval for W, where x = In/ = 4, we must first form a 
prediction interval for Y = In W. As before, the prediction interval is 


а "We 
Bo + Bix" Æ 1055 ЕЕ A 


where t o5 is based on n — 2 = 13 df. Therefore, to5 = 1.771 and the 90% prediction 
interval for Y = In W is 


8.476 + 3.4312(4) 3 LIN 123) 1+ ЕБЕ 
| | MEM 15 8548 


5.2488 + .2321, 
or 
(5.0167, 5.4809). 


Because Ў = In W, we can predict W by ef = е52488 — 190.3377. The observed 
90% prediction interval for W is 


(270767, e5480), or (150.9125, 240.0627). 


When x = In/ = 4, then | = et = 54.598. Thus, for an alligator of length 54.598 
inches, we predict that its weight will fall between 150.91 and 240.06 pounds. The 
relatively narrow interval on the natural logarithm scale becomes a rather wide interval 
when transformed to the original scale. H 
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The data presented and analyzed in this section are examples from real experi- 
ments; methods developed in previous sections of this chapter were applied to produce 
answers of actual interest to experimenters. Through Example 11.11, we have demon- 
strated how the theory of linear models sometimes can be applied after transformation 
of the scale of the original variables. Of course, not all models can be linearized, but 
numerous techniques for nonlinear least-squares estimation are available. 


Exercises 


Refer to Example 11.10. Find a 90% prediction interval for the strength of concrete when the 
water/cement ratio is 1.5. 


Refer to Example 11.11. Calculate the correlation coefficient r between the variables In W and 
In/. What proportion of the variation in y = Inw is explained by x = In/? 


It is well known that large bodies of water have a mitigating effect on the temperature of the 
surrounding land masses. On a cold night in central Florida, temperatures were recorded at 
equal distances along a transect running downwind from a large lake. The resulting data are 
given in the accompanying table. 


*11.64 


*11.65 


11.10 


11.10 Fitting the Linear Model by Using Matrices 609 


Site (x) Temperature °F, (у) 


37.00 
36.25 
35.41 
34.92 
34.52 
34.45 
34.40 
34.00 
33.62 
33.90 


© © со з с о \л Бошо г -— 


— 


Notice that the temperatures drop rapidly and then level off as we move away from the lake. 
The suggested model for these data is 


E(Y) = ae". 


a Linearize the model and estimate the parameters by the method of least squares. 


b Find a 90% confidence interval for a. Give an interpretation of the result. 


Refer to Exercise 11.14. One model proposed for these data on the proportion of survivors of 
thermal pollution is 


E(Y) = ехр(—оох°!). 


Linearize this model and estimate the parameters by using the method of least squares and the 
data of Exercise 11.14. (Omit the observation with y = 1.00.) 


In the biological and physical sciences, a common model for proportional growth over time is 
E(Y)=1-e*, 


where Y denotes a proportion and г denotes time. Y might represent the proportion of eggs 
that hatch, the proportion of an organism filled with diseased cells, the proportion of patients 
reacting to a drug, or the proportion of a liquid that has passed through a porous medium. With 
n observations of the form (y;, f;), outline how you would estimate and then form a confidence 
interval for В. 


Fitting the Linear Model by Using Matrices 


Thus far in this chapter, we have dealt almost exclusively with simple linear regression 
models that have enabled us to express our derivations and results by using ordinary 
algebraic expressions. The only practical way to handle analogous derivations and 
results for multiple linear regression models is through the use of matrix algebra. In 
this section, we use matrices to re-express some of our previous results and to extend 
these results to the multiple linear regression model. 

Suppose that we have the linear model 


Y = Bot Bix, +--+ + Хх +e 
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апа we make n independent observations, y1, y», , Yn, оп Y. We can write the 


Observation y; as 


yi = Bo + Віхи + Boxi2 +: + Вах + £i, 


where xj; is the setting of the jth independent variable for the ith observation, 


і = 1,2,...,n. We now define the following matrices, with xo = 1: 
У Хо Хи X12 Хі 
У2 Xo X21 X22 Хэд 
у= 29 |а X=]. Й : = d 

Yn Хо Xni Xm Xnk 

Bo E1 

Ві £2 

В= |. |, e|. 

By En 


Thus, the n equations representing y; as a function of the x's, B's, and e’s can be 
simultaneously written as 


Y = ХВ + є. 


(See Appendix 1 for а discussion of matrix operations.) 
For n observations from a simple linear model of the form 


Y = В+ Вх + €, 


we have 
yı 1 x £1 
y2 І x E 
Y=. |; X=]. а " є = i ; B= Bo . 
: zt dm : [ 
1 Хп En 


Yn 
(We suppress the second subscript on x because only one x variable is involved.) The 


least-squares equations for Во and В were given in Section 11.3 as 


n n 
npo Ві у х; = у yi, 
i=l i=l 
n 


n n 
Bo Ух + Ё. D = x». 
i=l i=l 


i=1 


Because 
n 
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and 
2 yi 
i=l 
XY = n d 
2s Xi yi 
i=l 
if 
à _ | Bo 
8-3 
we see that ће least-squares equations аге given by 
(X'X)B = ХҮ. 
Hence, 


B= (Х'Х) x. 
Although we have shown that this result holds only for a simple case, it can be shown 
that in general the least-squares equations and solutions presented in matrix notation 
are as follows. 


Least-Squares Equations and Solutions for a General Linear Model 
Equations: (Х' X)8 ZXY. 
Solutions: B = (X'X) !X'Y. 


EXAMPLE 11.12 Solve Example 11.1 by using matrix operations. 


Solution From the data given in Example 11.1, we see that 


Хо XI 

0 1 —2 

0 1—1 

Y = I and X=] 1 0 
1 1 1 

3 1 2 


It follows that 


ms 0 ig [5 wa [1/50 
xx- |5 | хү= [5 кА) = A 


Thus, 
Ді [d 0 5| |I 
Pe xv-[ 0 (е | 


or Êo = 1 and f, = .7. Thus, 
$a d ab 7%, 
just as in Example 11.1. El 
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EXAMPLE 11.13 


Solution 


Fit a parabola to the data of Example 11.1, using the model 
Y = fo + Bix + fox? + є. 
The X matrix for this example differs from that of Example 11.12 only by the addition 


of a third column corresponding to x?. (Notice that x, = x, x2 = x?, and k = 2 in 
the notation of the general linear model.) Thus, 


Xo X X 
0 1 —2 4 
0 1 -1 1 
1 1 1 1 
3 1 2 4 
(The three variables, xo, x, and х2, are shown above their respective columns in the 
X matrix.) Thus, for the first measurement, у = 0, хо = 1, x = —2, and x? = 4; and 
for the second measurement, у = 0, хо = 1, x = —1, and х? = 1. Succeeding rows 


of the Y and X matrices are obtained in a similar manner. 
The matrix products X'X and X'Y are 


1 —2 4 
1 1 1 1 1 I =f 1 5 0 10 
xx- 2 =] 0 1 1 1 0 0 -| o 10 1 
4 10 14 1 1 1 10 0 34 
1 2 4 
0 
1 1 1 1 1 0 5 
Е =ї 0 1 1 1 IH 
4 10 14 1 13 
3 


We omit the process of inverting X'X and simply state that the inverse matrix is equal to 
17/35 0 —1/7 
(Xx)! -| 0 1/10 0 [ 
—1/7 0 1/14 


[You may verify that (X'X) !X'X - I] 
Finally, 


В = (Х'Х) xv 


17/35 0  —1/7 5 4/7 .571 
-| 0 1/10 0 Ia |- [270] = ЕІ 
—1/7 0 1/14 13 3/14 ‚214 


Непсе, Bo = 571, fi = .7, and Bo = .214, and the prediction equation is 
ў = STL + 7x + 214x?. 
A graph of this parabola on Figure 11.6 will indicate a good fit to the data points. Ё 
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The expressions for V (Bo), V (B3), Cov(Bo, В), and SSE that we derived in 
Section 11.4 for the simple linear regression model can be expressed conveniently 
in terms of matrices. We have seen that for the linear model Y = Во + Bix + e, X'X 
is given by 

ХХ = | п Ум | К. 
Ух Xx 


It can be shown that 


Cio Cil 


XX! = п$хх i = б | | 


$хх Sx 


By checking the variances and covariances derived in Section 11.4, you can see that 
V(B)-cio, i=0,1 
and 
Cov(Bo. B1) = сис? = со”. 


Recall that an unbiased estimator for о2, the variance of the error term e, is given 
by 52 = SSE/(n — 2). A bit of matrix algebra will show that SSE = Y (y; — ĵ;)? can 
be expressed as 


SSE = Y'Y — 8Х'Ү. 


(Notice that Y'Y = У Y?.) 


EXAMPLE 11.14 Епа the variances of the estimators Ёо and f, for Example 11.12 and provide ап 
estimator for o?. 


Solution In Example 11.12, we found that 


Acl. [1/5 0 
(XX) -[ 0 m 


Hence, 
V (Bo) = coo? = (1/5)о?, 
V(B1) = спо? = (1/10)о2. 


As before, Cov(Bo, B) = 0 in this case because У х; = 0. For these data, 


1 

0 1 

Ү= |1 |, Х= |1 
1 1 

3 1 
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Hence, 


SSE = ҮҮ — Ü'X'Y 


0 
0 
-[0 0 1 1 3]| 1 -u n. 
1 
3 


Ne 
| 

—— 
ore 
—— 
NR 
J 

nee о о 


| =11-99=1.1. 


SSE Ld 1.1 
2 
= = = = .367. 
dg = Soo e 
Notice the agreement with the results that were obtained in Examples 11.2 and 11.3. 


Exercises 


11.66 Refer to Exercise 11.3. Fit the model suggested there by use of matrices. 


11.67 Use the matrix approach to fit a straight line to the data in the accompanying table, plot the 
points, and then sketch the fitted line as a check on the calculations. The data points are the 
same as for Exercises 11.3 and 11.66 except that they are translated 1 unit in the positive 
direction along the x-axis. What effect does symmetric spacing of the x-values about x — 0 
have on the form of the (X'X) matrix and the resulting calculations? 


mee NU 
шо о н OF 


11.68  Fitthe quadratic model Y = By + бух + fox? + € to the data points in the following table. Plot 
the points and sketch the fitted parabola as a check on the calculations. 
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The manufacturer of Lexus automobiles has steadily increased sales since the 1989 launch of 
that brand in the United States. However, the rate of increase changed in 1996 when Lexus 
introduced a line of trucks. The sales of Lexus vehicles from 1996 to 2003 are shown in the 
accompanying table.!? 


x y 
1996 18.5 
1997 22.6 
1998 27.2 
1999 312 
2000 33.0 
2001 44.9 
2002 49.4 
2003 35.0 


a Letting Y denote sales and x denote the coded year (—7 for 1996, —5 for 1997, through 7 
for 2003), fit the model Y = By + Bix + =. 


b For the same data, fit the model Y = By + fix + fox? + €. 


e 


Calculate SSE and 5? for Exercise 11.4. Use the matrix approach. 
b Fitthe model suggested in Exercise 11.4 for the relationship between audited values and 
book values by using matrices. We can simplify the computations by defining 


ж A т 
x; =Ap—X 


and fitting the model Y = Bj + Brx* + e. Fit this latter model and calculate SSE. Compare 
your answer with the SSE calculation in part (a). 


Linear Functions of the Model Parameters: 
Multiple Linear Regression 


All of the theoretical results of Section 11.4 can be extended to the multiple linear 
regression model, 


Y; = Bo + Bixit + +++ + бк + £i, [= Зу шыу 


Suppose that £1, £2, ..., En are independent random variables with E(e;) = 0 and 
V (£i) = o?. Then the least-squares estimators are given by 


B = (Х'Х) ХҮ, 


provided that (X'X) ! exists. The properties of these estimators are as follows (proof 
omitted). 


10. Source: Adapted from Automotive News, 26 January 2004. 
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Properties of the Least-Squares Estimators: Multiple Linear Regression 


1. E(B;) = В,і =0,1,...,k. 
V (Bi) = c;;07, where с is the element in row i and column i of 
(X Хул. (Recall that this matrix has a row and column numbered 0.) 
3). Cov(f;, Bp = cijo?, where cj; is the element in row i and column j 
of(X'X) . 
4. Anunbiased estimator of o? is $? — SSE/[n — (k + 1)], where 
SSE — Y'Y — px’ Y. (Notice that there are k + 1 unknown f; values 
in the model.) 


If, in addition, the ¢;, fori = 1, 2, ..., n are normally distributed, 


5. Each f; is normally distributed. 
6. The random variable 
[п — (k + DIS? 


o? 


has a x? distribution with n — (k 4- 1) df. 
7. The statistic S? and B ; are independent for each i = 0, 1, 2,..., К. 


11.12 Inferences Concerning Linear Functions 
of the Model Parameters: Multiple 
Linear Regression 


As discussed in Sections 11.5 and 11.6, we might be interested in making inferences 
about a single 6; or about linear combinations of the model parameters Во, £1, ..., Bx. 
For example, we might wish to estimate E(Y), given by 


E(Y) = Bo + Bixi +--+ + Вх, 


where E(Y) represents the mean yield of a chemical process for settings of controlled 
process variables ху, x2, ..., xy; or the mean profit of a corporation for various invest- 
ment expenditures x1, x», ..., Xk. Properties of estimators of such linear functions are 
given in this section. 


Suppose that we wish to make an inference about the linear function 
aopo + а Bi + ao +--+ + aki, 


where ao, 41, a2, ..., ay are constants (some of which may equal zero). Defining the 
(k + 1) x І matrix, 
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it follows that a linear combination of the Во, 61, ..., £y corresponding to ao, a1, . . ., 
a, may be expressed as 


а В = aobo + а В. + aoo +--+ + аку. 


From now on, we will refer to such linear combinations in their matrix form. Because 
a’ В is a linear combination of the model parameters, an unbiased estimator for a’ is 
given by the same linear combination of the parameter estimators. That is, by Theorem 
3.12,af 


а В = адо + aifi + ай» +: + аё = ад, 
Шеп 
Е(а'д) = E(aofo + aii + afl +: + а,б) 
= aofo + а1й + a» + +++ + абу = а' 8. 
Applying the same theorem, we find the variance of a’ Ê: 
V (a/B) = V (aoĝo + aif + arB2 +--+ + ай) 
= a V (Во) + aj V (ÊD) + a3 V (й) - +a V (Bx) 
+ 2aoa1Cov(flo, P1) + 2а0аСоу(до, 2) 
+--+ + 2ajazCov(Â1, B2) + +++ + 2ar-1arCov(ĝr1, Bx), 
where V (B;) = cio? and Cov(f;, Bj) = сүо?. You may verify that V(a'B) is 
given by 
V (a/B) = [a (X'X) ta]o?. 
Finally, recalling that Bo. B 1› B 2e, B к are normally distributed in repeated sam- 
pling (Section 11.11), it is clear that a’ is a linear function of normally distributed 


random variables and hence itself is normally distributed in repeated sampling. 
Because a’ is normally distributed with 


E(a'B) = a'B 
and V (а'д) = [а'(Х'Х) !а]о2, we conclude that 
a' — а' 8 а — aB 
ад) У) 7а 
has a standard normal distribution and could be employed to test a hypothesis 
Ho: a'B = (a'B)o 


when (a'8)o is some specified value. Likewise, a 100(1 — a@)% confidence interval 


for a’ is 
а'д + Zapo Val (X'X)-la. 
Furthermore, as we might suspect, if we substitute 5 for c, the quantity 
ав-ав 
S /a (X'X)-!a 


Z 
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possesses a Student's f distribution in repeated sampling, with [n — (k + 1)] df, and 
provides a test statistic to test the hypothesis 


Ho: а' B = (a'B)o. 


A Test for а'3 
Ho: а' В = (a'B)o. 
а'3 > (a'8)o, 
lela s a'b = (а' 8), 
a'b  (а' 8)о. 
ад — (a'B)o 
S Va (XX)-la. 


бте tye 


Test statistic: T = 


Rejection region: 1 t < —ty, 
|t| > боро. 
Here, tą is based on [n — (k + 1)] df. 


The corresponding 100(1 — w)% confidence interval for a’G is as follows. 


A 100(1 — a)% Confidence Interval for а' 8 


a! B + t, 5$ a (X'X)-1a. 


As earlier, the tabulated f,/2 in this formula is based on [n — (k + 1)] df. 
Although we usually do not think of a single В; as a linear combination of 


Во, Bi, .... Br, if we choose 
i-e 


0, ifjzi, 
then 6; = a'Ó for this choice of a. In Exercise 11.71, you will show that with this 
choice of a, a’(X’X)~!a = с, where cj; is the element in row i and column i of 
(X'X)-!. This fact greatly simplifies the form of both the test statistic and confidence 
intervals that can be used to make inferences about an individual f;. 

As previously indicated, one useful application of the hypothesis-testing and con- 
fidence interval techniques just presented is to the problem of estimating the mean 
value of Y, E(Y), for fixed values of the independent variables xj, xo, ..., 2%. In 
particular, if х* denotes a specific value of x;, fori = 1, 2,..., К, then 


E(Y) = Bo + Bix} + Boxy t t бхр. 


Notice that E(Y) is a special case of адо + а В + · -- + адк = a' with ap = 1 
and a; = x7, fori = 1,2,...,k. Thus, an inference about E(Y) when x; = x7, for 
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i = 1,2,..., К, can be made by using the techniques developed earlier for general 
linear combinations of the 8°. 
We illustrate with two examples. 


EXAMPLE 11.15 


Solution 


Do the data of Example 11.1 present sufficient evidence to indicate curvature in the 
response function? Test using о = .05 and give bounds to the attained significance 
level. 


The preceding question assumes that the probabilistic model is a realistic description 
of the true response and implies a test of the hypothesis Ho: f» = О versus H,: В: # 0 
in the linear model Y = Во + Bix + fox? + = that was fit to the data in Example 
11.13. (If 62 = 0, the quadratic term will not appear and the expected value of Y will 
represent a straight-line function of x.) The first step in the solution is to calculate 
SSE and s?: 


5 
SSE = YY- ХҮ = 11 — [.571 .700 .214] | 7 | 
13 
= 11 — 10.537 = .463, 


so then 


Gol oF а аа зла 
п – 3 2 
(Notice that ће model contains three parameters апа, hence, SSE is based on n — 3 = 
2 df.) The parameter f» is a linear combination of Во, Ві and f» with ао = 0, a; = 0, 
and a = 1. For this choice of a, we have f; = a' 8 and a/(X'X) !a = c». 
The calculations in Example 11.13 yielded By = 3/14 ~ .214 and c»; = 1/14. 
The appropriate test statistic can therefore be written as 


io 24 _ 
^ scm 48/1/14 -— 


If we take о = .05, the value of fe;2 = 1025 for 2 df is 4.303, and the rejection 
region is 


1.67. 


reject if |t| > 4.303. 


Because the absolute value of the calculated value of t is less than 4.303, we cannot 
reject the null hypothesis that 62 = 0. We do not accept Ho : f» = 0 because we 
would need to know the probability of making a type II error—that is, the probability 
of falsely accepting Но for a specified alternative value of 8;—before we could make 
a statistically sound decision to accept. Because the test is two-tailed, p-value — 
2P(t > 1.67), where ft has a t distribution with 2 df. Using Table 5, Appendix 3, we 
find that P(t > 1.67) > .10. Thus, we conclude that p-value > .2. More precisely, 
the applet Student's t Probabilities and Quantiles can be used to establish that p- 
value = 2P (t > 1.67) = 2(.11843) = .23686. Unless we are willing to work with a 
relatively large value of o (at least .23686), we cannot reject Ho. Again we notice the 
agreement between the conclusions reached by the formal (fixed o) test procedure 
and the proper interpretation of the attained significance level. 
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As a further step in the analysis, we could look at the width of a confidence interval 
for Вә to see whether it is short enough to detect a departure from zero that would be 
of practical significance. The resulting 95% confidence interval for f» is 


Bo X 1058/2. 
Substituting, we get 
.214 + (4.303)(.48)/1/14, or .214 + .552. 


Thus, the confidence interval for f» is quite wide, suggesting that the experimenter 
needs to collect more data before reaching a decision. E 


EXAMPLE 11.16 


Solution 


For the data of Example 11.1, find а 90% confidence interval for E (Y) when x = 1. 


For the model of Example 11.1, 
/ . ао 1 
E(Y) = Bo + Bix = а В, with a= = ; 
The desired confidence interval is given by 


a! B+ tap Sy a (X'X) la. 


In Example 11.12, we determined that 


a [1 "WU Фф 
ке | иш (ЮЗ) ake mor 


Because we are interested in x — 1, 


i-i ад = 1 uf = 
a SEE 1/5 0 1]_ 
а(х'Х) а= [1 ui 0 buses 


In Example 11.14, we found 52 to be .367, or s = .606 for these data. The value 
of tos with n — 2 = 3 df is 2.353, and the required 90% confidence interval for E(Y) 
is given by 


1.7 + (2.353)(606)/.3, ог 1.724 781. 


Our answer here is the same as that obtained in Example 11.6 without the use of 
matrices. П 


11.71 


11.72 


11.73 


11.74 


Exercises 621 


Exercises 


Consider the general linear model 


Y = Bo + Bixi + Baxa +--+ + Вх + е, 

where Е (=) = 0 and V (£) = o?. Notice that B; = a’B, where the vector a is defined by 

1, ifj =i, 

aj = байы \ 

0, if; Fi. 
Use this to verify that E(f;) = В; and V(B;) = cjjo?, where с is the element in row i and 
column i of (X’X)~!. 
Refer to Exercise 11.69. 
a Is there evidence of a quadratic effect in the relationship between Y and x? (Test Ho: 

p» = 0.) Use œ = .10. 

b Find a 90% confidence interval for £2. 
The experimenter who collected the data in Exercise 11.68 claims that the minimum value of 


E(Y) occurs at x = 1. Test this claim at the 5% significance level. [Hint: E(Y) = fo + Bix + 
Вох? has its minimum at the point хо, which satisfies the equation f, + 25x, = 0.] 


An experiment was conducted to investigate the effect of four factors—temperature Т, pressure 
P, catalyst C, and temperature 7;—on the yield Y of a chemical. 


a The values (or levels) of the four factors used in the experiment are shown in the accompa- 
nying table. If each of the four factors is coded to produce the four variables хі, x2, хз, and 
X4, respectively, give the transformation relating each coded variable to its corresponding 


original. 
Ti ХІ Р X2 C X3 Т X4 
50 =l 10 —1 1 -l 100—1 
70 1 20 1 2 1 200 1 


b Fit the linear model 


Y = Bo + Bix, + Бо + Взхз + Ваха + E 
to the following table of data. 


X4 

+1 —1 

X3 Хз 
—1 1 —1 1 
—1 | x2 | —1 | 222 245 | 244 25.9 
xı 1 | 19.4 24.1 | 25.2 284 
—1 | 22.1 19.6 | 23.5 16.5 
+1 | x 1 | 142 127 | 19.3 16.0 


с Do the data present sufficient evidence to indicate that T; contributes information for the 


estimation of Y? Does P? Does C? Does 7? (Test the hypotheses, respectively, that В; = 0, 


B» = 0, Вз = 0, and £4 = 0.) Give bounds for the p-value associated with each test. What 
would you conclude if you used o — .01 in each case? 
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Refer to Exercise 11.74. Find a 90% confidence interval for the expected yield, given that 
Т = 50, P = 20, C = 1, and Р = 200. 


The results that follow were obtained from an analysis of data obtained in a study to assess 
the relationship between percent increase in yield (Y) and base saturation (x;, pounds/acre), 
phosphate saturation (х2, BEC%), and soil pH (хз). Fifteen responses were analyzed in the 
study. The least-squares equation and other useful information follow. 


ӯ = 38.83 — 0.0092x, — 0.92x; + 11.56x3, Syy = 10965.46, SSE = 1107.01, 
151401.8 2.6 100.5 —28082.9 
ANNI _ 2.6 1.0 0.0 0.4 | 
do E 100.5 0.0 8.1 5.2 


—28082.9 0.4 5.2 6038.2 


a Is there sufficient evidence that, with all independent variables in the model, 6, < 0? Test 
at the œ = .05 level of significance. 


b Give a 95% confidence interval for the mean percent increase in yield if x; = 914, x. = 65 
and x3 = 6. 


Predicting a Particular Value of Y 
by Using Multiple Regression 


In Section 11.7, we considered predicting an actual observed value of Y in the simple 
linear regression, setting the single independent variable x = x*. The solution was 
based heavily on the properties of 


78 
error = Y* — Ү*, 


where Y* = Bo + 6,x* was observed to be a predictor of the actual value of Y and an 
estimator for E(Y) as well. The same method will be used in this section to provide 
the corresponding solution in the multiple linear regression case. Suppose that we 
have fit a multiple linear regression model 


Y = В+ Вх + Вю +: ХХ te 


and that we are interested in predicting the value of Y* when x = xj, x2 = X5, ..., 
хк = x;. We predict the value of У“ with 


Y* = Bo + Bixt + Boxk +--- + дух = al, 


where 
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As in Section 11.7, we focus on the difference between the variable Y* and the 
predicted value: 


error = Y* — Y*. 


Because both Y* and Y* are normally distributed, the error is normally distributed; 
and using Theorem 5.12 and the results of Section 11.11, we find that 


E(error) = 0 and V(error) = o?[1 + a (X'X) 'а] 
and that 
aye 


X 
с + a (X'X)-la 


has a standard normal distribution. Furthermore, if S is substituted for o, it can be 
shown that 


y*=Y* 
85у 1+ a'(X'X)*!a 
possesses a Student's t distribution with [n — (k + 1)] df. 


Proceeding as in Section 11.7, we obtain the following 100(1 — w)% prediction 
interval for У. 


T = 


A 100(1 — о) % Prediction Interval for Y when x; = xy. 
50 = VG pooop ) EXE 


a’ В + t, 5$ /1-- a/(X'X)-la, 


wines a c EU 


EXAMPLE 11.17 Suppose that the experiment that generated the data of Example 11.12 is to be run 
again with x = 2. Predict the particular value of Y with 1 — œ = .90. 


Solution In Example 11.12, we determined that 


ла ub. [1/50 
йе | and, EX) = К! 


Because we are interested in x = 2, the desired prediction interval is given by 


ад + tn SV 1-Fa'(X'X)-!a 


with 
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As before, s = .606 for these data, and the value of £95 with n — 2 = 3 df is 2.353. 
The 90% prediction interval for a future observation on Y when x = 2 is, therefore, 


2.4 + (2.353)(.606)V1+.6, or 2.4 + 1.804. 


Notice the agreement with the answer provided in Example 11.7 where we used 
ordinary algebra rather than the matrix approach in the solution. L1 


Exercises 


Refer to Exercise 11.76. Give a 95% prediction interval for the percent increase in yield in a 
field with base saturation = 914 pounds/acre, phosphate saturation = 6596, and soil pH = 6. 


Refer to Exercise 11.69. Find a 9896 prediction interval for Lexus sales in 2004. Use the 
quadratic model. 


Refer to Exercises 11.74 and 11.75. Find a 90% prediction interval for Y if T, = 50, P = 20, 
С = 1, and 7» = 200. 


A Test for Ho: Bgi1 = Вер = ++ = Bk =0 


In seeking an intuitively appealing test statistic to test a hypothesis concerning a set 
of parameters of the linear model, we are led to a consideration of the sum of squares 
of deviations SSE. Suppose, for example, that we were to fit a model involving only a 
subset of the independent variables under consideration—that is, fit a reduced model 
of the form 


model R: Y = Bo + B1x1 + b2x2 +--+ + Вх +E 


to the data—and then were to calculate the sum of squares of deviations between the 
observed and predicted values of Y, SSEr. Having done this, we might fit the linear 
model with all candidate independent variables present (the complete model): 


model C: Y = fo + Bixi + Ёю +--+ + Вх + Вох +++ + Ве +e 


and determine the sum of squares of deviations for this model, SSEc. Notice that 
the complete model contains all the terms of the reduced model, model R, plus the 
extra terms X44, Xg+2, ..., xy (notice that k > g). If хорт, Xg42,..., Xg contribute 
a substantial quantity of information for the prediction of Y that is not contained in 
the variables x1, x2, . . . , Xg (that is, at least one of the parameters Вр, Byy2,..., Bx 
differs from zero), what would Бе the relationship between SSEr and SSEc? Intu- 
itively, we see that, if x41, Xg+2, ..., xy are important information-contributing vari- 
ables, model C, the complete model should predict with a smaller error of prediction 
than model К. That is, SSEc should be less than SSEr. The greater the difference 
(SSE р — SSEc), the stronger will be the evidence to support the alternative hypothesis 
that xo4.1, Xg+2, ... , xy contribute information for the prediction of Y and to reject the 


FIGURE 11.8 
Partitioning SSEg 
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null hypothesis 
Ho: Bo+1 = Во == В; = 0. 
The decrease in the sum of squares of deviations (SSE к— SSEc) is called the sum 
of squares associated with the variables хо, Xg+42, .. . , xy, adjusted for the variables 
X1, X2; X3, 2. Ху. 


We indicated that large values of (SSEr — SSEc) would lead us to reject the 
hypothesis 


Ho: Beri = Во = = By = 0. 


How large is “large”? We will develop a test statistic that is a function of (SSEr — 
SSEc) for which we know the distribution when Но is true. 

To acquire this test statistic, let us assume that the null hypothesis is true and then 
examine the quantities that we have calculated. Particularly, notice that 


SSEr = SSEc + (SSEr — SSEc). 


In other words, as indicated in Figure 11.8, we have partitioned SSE into two parts: 
SSEc and the difference (SSEr — SSEc). Although we omit the proof, if Ho is true, 
then 


>  SSEpR 
Хз = - 
Ga 6 
а? 
5. _ SSEx — SSEc 
ЕЛЕНЕ Е 


possess x? probability distributions in repeated sampling, with (n — [g + 1]), 
(n — [k + 1]), and (k — g) df, respectively. Further, it can be shown that xi and 
xi are statistically independent. 

The definition of a random variable with an F distribution is given in Definition 
7.3. Consider the ratio 


_ Xi/K —g) (58Ев — SSEc)/( — 8) 
xiln-—[k-1D) | (SSEO/QG-[k 1D ` 
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If Ho: Bg 41 = Во = +++ = fy = O is true, then F possesses an F distribution with 
vı = k — g numerator degrees of freedom and v = n — (К + 1) denominator degrees 
of freedom. We have previously argued that large values of (SSEr — SSEc) lead us 
to reject the null hypothesis. Thus, we see that large values of F favor rejection of 
Ho; if we desire a test with a type I error probability equal to a, it follows that 


F > Fa 


is the appropriate rejection region. (See Table 7, Appendix 3.) 


EXAMPLE 11.18 Ро the data of Example 11.13 provide sufficient evidence to indicate that the second- 
order model 


Y = Bo + Bix + Вох? + € 


contributes information for the prediction of У? That is, test the hypothesis Ho: В = 
By = 0 against the alternative hypothesis H,: at least one of the parameters 1, f», 
differs from 0. Use о = .05. Give bounds for the attained significance level. 


Solution For the complete model, we determined in Example 11.15 that SSEc = .463. Because 
we want to test Ho: Ву = f» = 0, the appropriate reduced model is 


Y= Bo +€ 
for which 
хо 
0 1 
0 1 
Ү = | 1 and Х= |1 
1 1 
3 1 
Because X'X = 5, (X’X)~! = 1/5 and B = (X'X)*'!X'Y = (1/5) Shi у = У = 


5/5 = 1. Thus, 
SSEr = ҮҮ — ХҮ 
5 п 5 1/3 2 
ун (У) У) 
i=l i=l i=l DONI] 
= 11 — (1/5)(5)? = 11-5 = 6. 
In this example, the number of independent variables in the complete model is k = 2, 
and the number of independent variables in the reduced model is g = 0. Thus, 
_ (SSEr — SSEc)/(K — g) (6 — .463)/(2 — 0) 
© (SSEc)/@—[k+1]))  — .463/(5 – 3) 
The tabulated F-value for а = .05 with v; = k — g = 2 numerator degrees of 
freedom and v; = n — (k + 1) = 2 denominator degrees of freedom is 19.00. Hence, 


the observed value of the test statistic does not fall in the rejection region, and we 
conclude that at the о = .05 level there is not enough evidence to support a claim 


= 11.959. 


11.14 A Test for Ho: Be+1 = Веро =: = Be =O 627 


that either Ві or В differs from zero. Because the proper form of the rejection region 
is F > Fy, the p-value is given by P(F > 11.959) when F is based on 2 numerator 
and 2 denominator degrees of freedom. Using Table 7, Appendix 3, you can see 
that .05 < p-value < .10. Further, the applet F-Ratio Probabilities and Quantiles 
gives P(F > 11.959) = .07717. Thus, if we chose о = .05 (in agreement with the 
previous discussion), there is not enough evidence to support a claim that either | 
ог f2 differs from zero. However, if any o value equal to or greater than .0772 were 
selected, we could claim that either Ву 4 0 or В 4 0. Notice that the little additional 
effort required to determine the p-value provides a considerable amount of additional 
information. 0 


Consider the situation where we have fit a model with k independent variables and 
wish to test the null hypothesis 


Ho: В = В =- -= Be = 0 


that none of the independent variables in the model contribute substantial information 
for the prediction of Y. This is exactly what was done in Example 11.18. An exami- 
nation of the solution of that example will convince you that the appropriate reduced 
model is of the form 


Y = Во + €. 


This reduced model contains g = 0 independent variables and is such that SSEr = Syy 
(see Example 11.18). Thus, a test for 


Ho: В = В = = В = 0 
can be based on the statistic 
_ (SSEr -SSEc)/(k —9) |. (Sj — SSEc)/k 
(SSEc)/(n — [k + 1]) (SSEc)/(n — [k + 1])’ 

which possesses ап F distribution with vj = k and v? = n — (К + 1) numerator and 
denominator degrees of freedom, respectively. 

What proportion of the variation in the observed values of the response variable, 
Y, is explained by the entire set of independent variables x1, x2, ..., х? The answer 
is provided by the multiple coefficient of determination R?, where 


R= Syy — SSEc 
Syy 
As with the simple coefficient of determination r?, the denominator of R? quantifies 
the variation in the y-values, and the numerator quantifies the amount of variation in 


the y's that is explained by the complete set of independent variables x1, x2,..., Xx. 
In Exercise 11.84(a), you will show that the F statistic for testing 


Ho: В. = В =--- = Be = 0 


can be calculated using Ё? through the formula 


—(k+1 R? 
jc АШ, | 
k 1- R? 
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As before, this statistic possesses an F distribution with vj = k and v =n — (k + 1) 
numerator and denominator degrees of freedom, respectively. 

Another application of the general method for comparing complete and reduced 
models is given in the following example. 


EXAMPLE 11.19 


Solution 


It is desired to relate abrasion resistance of rubber (Y) to the amount of silica filler x} 
and the amount of coupling agent x5. Fine-particle silica fibers are added to rubber to 
increase strength and resistance to abrasion. The coupling agent chemically bonds the 
filler to the rubber polymer chains and thus increases the efficiency of the filler. The 
unit of measurement for x, and x5 is parts per 100 parts of rubber, which is denoted 
phr. For computational simplicity, the actual amounts of silica filler and coupling 
agent are rescaled by the equations 


/ ГА 
хү — 50 "ETE x5—4 
6.7 2 

(Such rescaling of the independent variables does not affect the analysis or conclu- 
sions, but it does simplify computations.) 

The data!! are given in Table 11.6. Notice that five levels of both x; and x» are 
used, with the (ху = 0, x2 = 0) point repeated three times. Let us fit the second-order 
model 


X] = 


Y = Bo + Bixi + faxo + fax? + Bax + Bsxixo +e 


to these data. This model represents a conic surface over the (х1, x2) plane. Fit the 
second-order model and test Ho: £3 = 84 = fs = 0. (We are testing that the surface 
is actually a plane versus the alternative that it is a conic surface.) Give bounds for the 
attained significance level and indicate the proper conclusion if we choose о = .05. 


We will first use matrix equations to fit the complete model, as indicated earlier. 
(With models of this size, it is best to use a computer to do the computations.) For the 


Table 11.6 Data for Example 11.19 


y xi X2 


83 -1 
113 1 

92 — 

82 — 
100 

96 

98 

95 

80 : 
100 1:5 0 

92 c5 0 


ооо он н н - 
о 


о 
| 
л 


11. Source: Ronald Suich апа С. С. Derringer, Technometrics 19(2) (1977): 214. 
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data in Table 11.6, we have 


xy X3 al х2 Х1Х2 
83 1 1 —1 1 1 —1 
113 1 1 1 1 1 1 
92 1 -l 1 1 1 —1 
82 1 -l —1 1 1 1 
100 1 0 0 0 0 0 
Y= 96 |, X=] 1 0 0 0 0 0 |. 
98 1 0 0 0 0 0 
95 1 0 1.5 0 2.25 0 
80 1 0 —1.5 0 2.25 0 
100 1 1.5 0 2.25 0 0 
92 1 =1.5 0 2.25 -0 0 
33 0 0 —.15 —15 0 
0 0.12 0 0 0 0 
0 0 0.12 0 0 0 
(ХХ) =| —15 0 0 15 05 0 
—.15 0 0 05 15 0 
0 0 0 0 0 25 
These matrices yield 
98.00 
4.00 
А nacie 7.35 
B= (ХХ) ХҮ = _ gg | 
—4.66 
5.00 


or the fitted second-order model, 
ў = 98.00 + 4.00x + 7.35x; — .88х2 — 4.66x2 + 5.00x1 x2. 


For this model, SSEc = ҮҮ — Ax = 77.948. 
To test the hypothesis of interest (Ho: Вз = f4 = Bs = 0), we must fit the reduced 
model 


Y = В + Bix + Вх + =. 
By deleting the columns for «5 xj. and x4x» in the X matrix, we have 
: 93.73 
B-2qx!xv- | 2] : 
7.35 
and the fitted planar model is 


$ = 93.73 + 4.00х + 7.355. 


(Notice that we cannot simply set B 3, Ba, and B 5 equal to zero to produce the fitted 
model in the reduced case.) For the reduced model, SSEr = 326.623. 
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We now test the hypothesis Ho: Вз = 84 = fs = О by calculating F (notice that 
k = 5, g =2,andn = 11): 


_ (SSEr — SSEc)/(k — g) _ (326.623 — 77.948)/3 


= RC 
SSEc/In — (k + D] 77.948/5 


Because this statistic is based on vj = (k — g) = 3 numerator degrees of freedom 
and v) = n — (k + 1) = 5 denominator degrees of freedom, the p-value is given by 
P(F > 5.32). Thus, using Table 7, Appendix 3, .05 < p-value < .10. The applet 
F-Ratio Probabilities and Quantiles gives the exact p-value = P(F > 5.32) = 
.05155. If we choose о = .05, there is insufficient evidence to support a claim that 
the second-order model fits the data significantly better than does the planar model. Is 
the exact p-value — .05155 small enough to convince you that the second-order model 
fits better than the planar model? Only you can answer that question. Notice that we 
have tested whether the group of variables ss Coe xıx2 contributed to a significantly 
better fit of the model to the data. a 


11.80 


11.81 


11.82 


11.83 


11.84 


Exercises 


Refer to Exercise 11.31. Answer the question on the increase in peak current by constructing 
an F test. 


In Exercise 11.80, you used an F test to test the same hypothesis that was tested in Exercise 
11.31 via a t test. Consider the general simple linear regression case and ће F and t statistics 
that can be used to implement the test of Ho: В, = 0 versus H,: f; # 0. Show that in general 
Е = 1. Compare the value of F obtained in Exercise 11.80 to the corresponding value of t 
obtained in Exercise 11.31. 


Refer to Exercise 11.76 where we obtained the following information when fitting a multiple 
regression model to 15 responses; 


ў = 38.83 — 0.0092x, — 0.92x; + 11.56x5, Spy = 10965.46, SSE = 1107.01. 


a Is there sufficient evidence to conclude that at least one of the independent variables 
contributes significant information for the prediction of Y? 


b Calculate the value of the multiple coefficient of determination. Interpret the value of R?. 


Refer to Exercises 11.76 and 11.82. Does including the variables phosphate saturation хә and 
pH хз contribute to a significantly better fit of the model to the data? The reduced linear 
regression model, Y = Во + Вх; + £ was fit and we observed SSEr = 5470.07. 


a Implement the appropriate test of hypothesis at the о = .05 level of significance. 


b What is the smallest value of SSEg that would have allowed you to conclude that at least 
one of the variables (phosphate saturation and/or pH) contributed to a better fit of the model 
to the data? 


We have fit a model with К independent variables, and wish to test the null hypothesis Ну: 


Bi = В = ``: = В, = 0. 
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a Show that the appropriate F-distributed test statistic can be expressed as 


ве "610 ( R? ) 
k 1- R? 


b Ifk = 1 how does the value of F from part (a) compare to the expression for the T statistic 
derived in Exercise 11.55? 


A real estate agent's computer data listed the selling price Y (in thousands of dollars), the 
living area x; (in hundreds of square feet), the number of floors хә, number of bedrooms хз, 
and number of bathrooms x4 for newly listed condominiums. The multiple regression model 
E(Y) = fo + Bixi + faxo + Взхз + Ваха was fit to the data obtained by randomly selecting 
15 condos currently on the market. 


a If R? = .942, is there sufficient evidence that at least one of the independent variables 
contributes significant information for the prediction of selling price? 


b If S,, = 163822, what is SSE? 


Refer to Exercise 11.85. A realtor suspects that square footage x, might be the most important 
predictor variable and that the other variables can be eliminated from the model without much 
loss in prediction information. The simple linear regression model for selling price versus 
square footage was fit to the 15 data points that were used in Exercise 11.85, and the realtor 
observed that SSE — 1553. Can the additional independent variables used to fit the model in 
Exercise 11.85 be dropped from the model without losing predictive information? Test at the 
a = .05 significance level. 


Does a large value of R? always imply that at least one of the independent variables should be 
retained in the regression model? Does a small value of R? always indicate that none of the 
independent variables are useful for prediction of the response? 


a Suppose that a model with k — 4 independent variables is fit using n — 7 data points and 
that R? — .9. How many numerator and denominator degrees of freedom are associated 
with the F statistic for testing Ho: В = f» = 3 = В; = 0? Use the result in Exercise 
11.84(a) to compute the value of the appropriate F statistic. Can Но be rejected at the 
a = .10 significance level? 

b Refer to part (a). What do you observe about the relative sizes of п and К? What impact 
does this have on the value of F? 

с A model with К = 3 independent variables is fit to п = 44 data points resulting in 
R? = .15. How many numerator and denominator degrees of freedom are associated with 
the F statistic for testing Ну: В, = f; = В; = 0? Use the result in Exercise 11.84(a) 
to compute the value of the appropriate F statistic. Can Ну be rejected at the a = .10 
significance level? 

d Refer to part (c). What do you observe about the relative sizes of п and К? What impact 
does this have on the value of F? 


Television advertising would ideally be aimed at exactly the audience that observes the ads. 
A study was conducted to determine the amount of time that individuals spend watching TV 
during evening prime-time hours. Twenty individuals were observed for a 1-week period, and 
the average time spent watching TV per evening, Y, was recorded for each. Four other bits 
of information were also recorded for each individual: x, = age, x. = education level, хз = 
disposable income, and x4 — IQ. Consider the three models given below: 


Modell: Y = Bo + Вх + faxa + B3x3 + Baxa + € 
Model П: Y = Во + fixi + Во + € 
Model III: Y = fo + fixi + Box. + Baxixo + E 
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Are the following statements true or false? 


a If Modell is fit, the estimate for o? is based on 16 df. 

b If Model П is fit, we can perform a t test to determine whether x» contributes to a better fit 
of the model to the data. 

If Models I and П are both fit, then SSE; < SSEjy. 

If Models I and II are fit, then 62 < on: 


I 
Model II is a reduction of model I. 


> о Qa 


Models I and Ш can be compared using the complete/reduced model technique presented 
in Section 11.14. 


Refer to the three models given in Exercise 11.88. Let К°, Rip and Rig denote the coefficients 
of determination for models I, II, and III. Are the following statements true or false? 


Z fr 
2 2 

b Ri > Күү. 
2 2 

c Ra SA 


Refer to Exercise 11.69. 


a For the quadratic model, carry out an F test of Ho: f; = 0, using а = .05. Compare the 
result to the result of the test in Exercise 11.72. 


b Test Ho: f| = Bo = О at the 596 significance level. 


Refer to Exercise 11.74. Test the hypothesis at the 596 level of significance that neither 7; nor 
T» affects the yield. 


Utility companies, which must plan the operation and expansion of electricity generation, are 
vitally interested in predicting customer demand over both short and long periods of time. A 
short-term study was conducted to investigate the effect of each month's mean daily temperature 
x, and of cost per kilowatt-hour, x; on the mean daily consumption (in kWh) per household. The 
company officials expected the demand for electricity to rise in cold weather (due to heating), 
fall when the weather was moderate, and rise again when the temperature rose and there was 
a need for air conditioning. They expected demand to decrease as the cost per kilowatt-hour 
increased, reflecting greater attention to conservation. Data were available for 2 years, a period 
during which the cost per kilowatt-hour x» increased due to the increasing costs of fuel. The 
company officials fitted the model 


Y = Bo + Вх + Box} + Вз + Вахт + sxpxa + € 


to the data in the following table and obtained $ = 325.606 — 11.383x; +.113х2 — 21.699x; + 
.873x1x2 — .009х?хә with SSE = 152.177. 


Mean Daily Consumption 


Price per kWh (x2) (kWh) per Household 
8¢ Mean daily °F temperature (x1) 31 34 39 42 47 56 
Mean daily consumption (y) 55 49 46 47 40 43 
10¢ Mean daily °F temperature (x) 32 36 39 42 48 56 
Mean daily consumption (y) 50 44 42 42 38 40 
8¢ Mean daily °F temperature (x) 62 66 68 71 75 78 
Mean daily consumption (y) 41 46 44 51 62 73 


10¢ Mean daily °F temperature (x1) 62 66 68 72 75 79 
Mean daily consumption (y) 39 44 40 44 50 99, 
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When the model Y = Во — Вх + Box? +e was fit, the prediction equation was $ = 130.009 — 
3.302x, + .033x? with SSE = 465.134. Test whether the terms involving x2 (x2, x1x», хі) 
contribute to a significantly better fit of the model to the data. Give bounds for the attained 
significance level. 


Refer to Example 11.19. Using the reduced model, construct a 95% confidence interval for the 
expected abrasion resistance of rubber when x, = 1 and x; = —1. 


Refer to Example 11.19. Construct individual tests of the three hypotheses Ho: f; = 0, 
Ho: 84 = О, and Ho: f5 = О. Usea 1% level of significance on each test. (If multiple tests are 
to be conducted on the same set of data, it is wise to use a very small o level on each test.) 


Summary and Concluding Remarks 


In this chapter, we have used the method of least squares to fit a linear model to 
an experimental response. We assumed that the expected value of Y is a function 
of a set of variables x1, x2, ..., xy, where the function is linear in a set of unknown 
parameters. We used the expression 


Y = Bo + Bix, + faxo +--+ + Bex + € 


to denote a linear statistical model. 

Inferential problems associated with the linear statistical model include estima- 
tion and tests of hypotheses relating to the model parameters Во, f1,..., y and— 
even more important—estimation of E(Y), the expected response for a particular 
setting, and the prediction of some future value of Y. Experiments for which the 
least-squares theory is appropriate include both controlled experiments and those 
where x4, хо, ..., xy are observed values of random variables. 

Why use the method of least squares to fit a linear model to a set of data? Where the 
assumptions about the random errors е hold [normality, independence, V (e) = o? 
for all values of x1, x2, ..., Хк], it can be shown that the least-squares procedure gives 
the best linear unbiased estimators for Во, Ві, ..., By. That is, if we estimate the 
parameters Во, В, ..., k, using linear functions of yi, y2, ..., Yk, ће least-squares 
estimators have minimum variance. Some other nonlinear estimators for the param- 
eters may possess a smaller variance than the least-squares estimators, but if such 
estimators exist, they are not known at this time. Again, why use least-squares esti- 
mators? They are easy to use, and we know they possess good properties for many 
situations. 

As you might imagine, the methodology presented in this chapter is employed 
widely in business and in all the sciences for exploring the relationship between a 
response and a set of independent variables. Estimation of E(Y) or prediction of Y 
usually is the experimental objective. 

Whole textbooks are devoted to the topic of regression. Our purpose has been to 
introduce many of the theoretical considerations associated with simple and multiple 
linear regression. Although the method of least squares can be used to estimate model 
parameters in general situations, the formal inference-making techniques that we pre- 
sented (based on the ¢ and F distributions) are valid only under the extra assumptions 
that we presented. Key assumptions include that the error terms in the model are nor- 
mally distributed and that the variance of the error terms does not depend on the value 
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of any independent variable(s). In practical applications, these assumptions may not 
be valid. Generally, assessments of the validity of model assumptions are based on 
analyses of the residuals, the differences between the observed and predicted (using 
the model) values of the response variable. Examination of the residuals, including 
plots of the residuals versus the independent variable(s) and plots of the residuals 
against their normal theory expected values, permits assessments of whether the 
assumptions are reasonable for a particular data set. Data points with unusually large 
residuals may be outliers that indicate that something went wrong when the corre- 
sponding observation was made. Some individual data points may have an unusu- 
ally large impact on the fitted regression model in the sense that the model fitted 
with these data points included differs considerably from the model fitted with them 
excluded (such points are often called high-influence points—see Exercise 11.108). 
A regression model might suffer from lack of fit, indicating that the selected model 
18 not adequate to model the response. In such cases, it might be necessary to fit a 
more complicated model to obtain sufficient predictive precision. An important con- 
sideration in multiple regression models is that of multicollinearity where some of 
the independent variables in the model are highly correlated with one another. We 
cannot do justice to these topics in a single introductory chapter on linear and multiple 
regression. We have focused on the general concept of least squares as a method for 
estimating model parameters and have provided the theoretical foundations for anal- 
yses based on the classical normal theory. The other issues described in this section 
are discussed in the supplemental references. 
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Supplementary Exercises 


At temperatures approaching absolute zero (—273°C), helium exhibits traits that defy many 
laws of conventional physics. An experiment has been conducted with helium in solid form at 
various temperatures near absolute zero. The solid helium is placed in a dilution refrigerator 
along with a solid impure substance, and the fraction (in weight) of the impurity passing 
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through the solid helium is recorded. (The phenomenon of solids passing directly through 
solids is known as quantum tunneling.) The data are given in the following table. 


Proportion of Impurity Passing 


°C Temperature (x) Through Helium (y) 
—262.0 315 
—265.0 .202 
—256.0 .204 
—267.0 .620 
—270.0 ‚715 
—272.0 935 
—272.4 957 
—272.7 .906 
—272.8 .985 
—212.9 .987 


а Fit a least-squares line to the data. 


b Test the null hypothesis Ho: В; = 0 against the alternative hypothesis H,: f; <0, at the 
a = .01 level of significance. 

c Find a 95% prediction interval for the percentage of the solid impurity passing through 
solid helium at —273°C. (This value of x is outside the experimental region where use of 
the model for prediction may be dangerous.) 


11.96 A study was conducted to determine whether a linear relationship exists between the breaking 
strength y of wooden beams and the specific gravity x of the wood. Ten randomly selected 
beams of the same cross-sectional dimensions were stressed until they broke. The breaking 
strengths and the density of the wood are shown in the accompanying table for each of the 
ten beams. 


Beam Specific Gravity (x) Strength (y) 


1 499 11.14 
2 558 12.74 
3 .604 13.13 
4 441 11.51 
5 550 12.38 
6 528 12.60 
7 418 11.13 
8 480 11.70 
9 406 11.02 
10 467 11.41 


а Fit the model Y = В + Bix + є. 

b Test Ho: Ву = О against the alternative hypothesis, H,: P; + 0. 
Estimate the mean strength for beams with specific gravity .590, using a 90% confidence 
interval. 


11.97 А response Y is a function of three independent variables xı, x2, and хз that are related as 
follows: 


Y = Bo + Bixi + Вх + Baxa + €. 
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a Fit this model to the n = 7 data points shown in the accompanying table. 


y ХІ X2 X3 
1 -3 5 — 

0 —2 0 1 
0 -1 —3 1 
1 0 —4 

2 1 -3 — 
3 2 0 — 

3 3 5 1 

b Predict Y when x, = 1, x2 = —3, хз = —1. Compare with the observed response in the 


original data. Why are these two not equal? 


с Do е data present sufficient evidence to indicate that x; contributes information for the 
prediction of Y? (Test the hypothesis Ho: 83 = 0, using о = .05.) 


d Find a 95% confidence interval for the expected value of Y, given x; = 1, x2 = —3, and 
X3 — —]. 
e Find a 95% prediction interval for Y, given x; = 1, x; = —3, and x; = —1. 


If values of independent variables are equally spaced, what is the advantage of coding to new 
variables that represent symmetric spacing about the origin? 


Suppose that you wish to fit a straight line to a set of л data points, where n is an even integer, 
and that you can select the п values of x in the interval —9 < x < 9. How should you select 
the values of x so as to minimize V (84)? 


Refer to Exercise 11.99. It is common to employ equal spacing in selecting the values of x. 
Suppose that л = 10. Find the relative efficiency of the estimator Ё, based on equal spacing 
versus the same estimator based on the spacing of Exercise 11.99. Assume that —9 < x < 9. 


The data in the accompanying table come from the comparison of the growth rates for bacteria 
types A and B. The growth Y recorded at five equally spaced (and coded) points of time is 
shown in the table. 


Time 
Bacteria Type | —2 -1 0 1 2 
А 8.0 9.0 9.1 102 104 
B 10.0 103 122 126 13.9 


a Fit the linear model 
Y = Bo + Вх + Box2 + Baxixo + € 


to the n = 10 data points. Let ху = 1 if the point refers to bacteria type B and let x, = 0 
if the point refers to type A. Let x» = coded time. 

b Plot the data points and graph the two growth lines. Notice that Вз is the difference between 
the slopes of the two lines and represents time-bacteria interaction. 

с Predict the growth of type A at time x. = 0 and compare the answer with the graph. Repeat 
the process for type B. 

d Do the data present sufficient evidence to indicate a difference in the rates of growth for 
the two types of bacteria? 
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e Find a 90% confidence interval for the expected growth for type B at time x; = 1. 
f Find a 90% prediction interval for the growth Y of type B at time x» = 1. 


The following model was proposed for testing whether there was evidence of salary discrimi- 
nation against women in a state university system: 


Y = Bo + Bixi + Вох + Bsxix2 + бхз +e, 
where 


Y = annual salary (in thousands of dollars), 
1, if female, 

X] = 
0, if male, 


хә = amount of experience (in years). 


When this model was fit to data obtained from the records of 200 faculty members, SSE = 
783.90. The reduced model Y = fo + fix. + Box3 + £ was also fit and produced a value 
of SSE = 795.23. Do the data provide sufficient evidence to support the claim that the mean 
salary depends on the gender of the faculty members? Use о = .05. 


Show that the least-squares prediction equation 
$ = Bot бух +--+ + Bere 


passes through the point (X1, X2,..., Xk, у). 


An experiment was conducted to determine the effect of pressure and temperature on the yield 
of a chemical. Two levels of pressure (in pounds per square inch, psi) and three of temperature 
were used: 


Pressure (psi) | Temperature (°F) 


50 100 
80 200 
300 


One run of the experiment at each temperature—pressure combination gave the data listed in 
the following table. 


Yield Pressure (psi) Temperature (°F) 


21 50 100 
23 50 200 
26 50 300 
22 80 100 
23 80 200 
28 80 300 


a Еке model Y = 69+ 8х1 + 62X2 + Взх2 +e, where x; = pressure and x? = temperature. 
b Test to see whether 3 differs significantly from zero, with a = .05. 
Test the hypothesis that temperature does not affect the yield, with a = .05. 


Let (X, Y) have a bivariate normal distribution. A test of Ho: р = 0 against H,: p 4 0 can be 
derived as follows. 
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a Let Sy =) (у — Y? and S, = У) (х; — x)?. Show that 


b Conditional on X; = xj, fori = 1,2, ..., n, show that under Hy: o = 0 


Bin — 2)Sxx 


J Syy( — r2) 
has a t distribution with (n — 2) df. 
c Conditional on X; = x;, fori = 1,2,...,n, conclude that 
гуп = 2 
Fä = 


/1-r? 
has a t distribution with (п — 2) df, under Hy: р = 0. Hence, conclude that T has the same 
distribution unconditionally. 


Labor and material costs are two basic components in the cost of construction. Changes in the 
component costs of course lead to changes in total construction costs. The accompanying table 
tracks changes in construction cost and cost of all construction materials for 8 consecutive 
months. 


Index of АП 

Construction Construction 

Month Cost (y) Materials (x) 
January 193.2 180.0 
February 193.1 181.7 
March 193.6 184.1 
April 195.1 185.3 
May 195.6 185.7 
June 198.1 185.9 
July 200.9 187.7 
August 202.7 189.6 


Do the data provide sufficient evidence to indicate a nonzero correlation between the monthly 
construction costs and indexes of all construction materials? Give the attained significance 
level. 


The data in the following table give the miles per gallon obtained by a test automobile when 
using gasolines of varying octane levels. 


Miles per Gallon (y) Octane (x) 


13.0 89 
13.2 93 
13.0 87 
13.6 90 
13.3 89 
13.8 95 
14.1 100 


14.0 98 
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a Calculate the value of r. 


b Do the data provide sufficient evidence to indicate that octane level and miles per gallon 
are dependent? Give the attained significance level, and indicate your conclusion if you 
wish to implement an o = .05 level test. 


Applet Exercise Access the applet Removing Points from Regression. Sometimes removing a 
point from those used to fit a regression model produces a fitted model substantially different 
that the one obtained using all of the data (such a point is called a high-influence point). 


a The top graph gives a data set and fitted regression line useful for predicting a student's 
weight given his or her height. Click on any data points to remove them and refit the 
regression model. Can you find a high influence data point in this data set? 

b Scroll down to the second graph that relates quantitative SAT score to high school rank. 
Does the slope of the fitted regression line surprise you? Can you find a high-influence 
data point? Does removing that data point produce a regression line that better meets your 
expectation regarding the relationship between quantitative SAT scores and class rank? 

€ Scroll down to the remainder of the data sets and explore what happens when different 
data points are removed. 
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The Elements Affecting the Information 
in a Sample 


A meaningful measure of the information available in a sample to make an inference 
about a population parameter is provided by the width (or half-width) of the confidence 
interval that could be constructed from the sample data. Recall that a 95% large-sample 
confidence interval for a population mean is 


Y+ 19s( 5). 


The widths of many of the commonly employed confidence intervals, like the confi- 
dence interval for a population mean, depend on the population variance o? and the 
sample size n. The less variation in the population, measured by o^, the shorter the 
confidence interval will be. Similarly, the width of the confidence interval decreases 
as n increases. This interesting phenomenon would lead us to believe that two factors 
affect the quantity of information in a sample pertinent to a parameter: namely, the 
variation of the data and the sample size n. We will find this deduction to be slightly 
oversimplified but essentially true. 

In previous chapters, when we were interested in comparing two population means 
or fitting a simple linear regression, we assumed that independent random samples 
were taken from the populations of interest. If we wish to compare two populations 
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based on a total of п observations, how many observations should be taken from each 
population? If we have decided to fit a simple linear regression model and wish to 
maximize the information in the resulting data, how should we choose the values of 
the independent variable? These questions are addressed in the next section. 
Generally, the design of experiments is a very broad subject concerned with meth- 
ods of sampling to reduce the variation in an experiment and thereby to acquire a 
specified quantity of information at minimum cost. If the objective is to make a com- 
parison of two population means, the matched-pairs experiment often suffices. After 
considering the matched-pairs experiment in Section 12.3, the remainder of the chap- 
ter presents some of the important considerations in the design of good experiments. 


Designing Experiments 
to Increase Accuracy 


As we will see, for the same total number of observations, some methods of data 
collection (designs) provide more information concerning specific population param- 
eters than others. No single design is best in acquiring information concerning all 
types of population parameters. Indeed, the problem of finding the best design for 
focusing information on a specific population parameter has been solved in only a 
few specific cases. The purpose of this section is not to present a general theory but 
rather to present two examples that illustrate the principles involved. 

Consider the problem of estimating the difference between a pair of population 
means, шу — H2, based on independent random samples. If the experimenter has 
resources sufficient to sample a total of n observations, how many observations should 
she select from populations 1 and 2—say, n; and n» (n; + n2 = n), respectively—to 
maximize the information in the data pertinent to ш — 45? If n = 10, should she 
select n; = n2 = 5 observations from each population, or would an allocation of 
n; = 4 and m = 6 be better? 

If the random samples are independently drawn, we estimate и — 2 with Y;—Y», 
which has standard error 


o? 03 
Pax. +. 
1 n2 


The smaller Oy, y, is, the smaller will be the corresponding error of estimation, and 
the greater will be the quantity of information in the sample pertinent to ш — u2. If, 


as we frequently assume, ry E 02 = o°, then 


You can verify that this quantity is a minimum when nı = п» and, consequently, 
that the sample contains a maximum of information about ш — u2 when the n 
experimental units are equally divided between the two treatments. A more general 
case is considered in Example 12.1. 
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EXAMPLE 12.1 


Solution 


If n observations are to be used to estimate и — u2, based on independent random 
samples from the two populations of interest, find n; and n2 so that V(Y; — Y2) is 
minimized (assume that n; + n» = n). 


Let b denote the fraction of the п observations assigned to the sample from population 
1; that is, nı = bn and n» = (1 — b)n. Then, 
E €. 
VY. — Y2) = — + ————. 
(Yi 2) Des + О Буя 
To find the fraction b that minimizes this variance, we set the first derivative, with 
respect to Б, equal to zero. This process yields 


о? 1 "n 1 = 
n \ b2 n\1l—b) ~ 


oito oi +o 


Solving for b, we obtain 


Thus, V (Y, — Y2) is minimized when 


O1 02 
п = ( )n and n= )л, 
о + 02 о + 02 


that is, when sample sizes are allocated proportionally to sizes of the standard devia- 
tions. Notice that nı = n/2 = m ifo, = o». E 


FIGURE 12.1 
Fitting a straight line 
by the method 

of least squares 


As a second example, consider the problem of fitting a straight line through a set of 
n points by using the least-squares method of Chapter 11 (see Figure 12.1). Further, 
suppose that we are primarily interested in the slope В; of the line in the linear model 


Y = Во + fix te. 


FIGURE 12.2 
A good design for 
fitting a straight 
line (n = 10) 
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If we have the option of selecting the n-values of x for which y will be observed, which 
values of x will maximize the quantity of information on £,? We have one quantitative 
independent variable x, and our problem is to decide on the values x1, x2, ..., x, to 
employ, as well as the number of observations to take at each of these values. 

The best design for estimating the slope 6; can be determined by considering the 
standard deviation of f: 


o o 
a = = Я 
a NAR E 2 
1e -» 
i=l 
The larger $,,, the sum of squares of deviations of x1, x2,..., x, about their mean, 


the smaller the standard deviation of By will be. That is, we obtain a better estimator 
for the slope if the values of x are spread farther apart. In some cases, the experimenter 
has some experimental region—say, x; < x < x,—over which he or she wishes to 
observe Y, and this range is frequently selected prior to experimentation. Then the 
smallest value for og, occurs when the n data points are equally divided, with half 
located at the lower boundary x, of the region and half at the upper boundary x». 
(The proof is omitted.) An experimenter who wished to fit a line by using n — 10 
data points in the interval 2 < x < 6 would select five data points at x = 2 and five 
at x — 6. Before concluding the discussion of this example, you should notice that 
observing all values of Y at only two values of x will not provide information on 
curvature of the response curve in case the assumption of linearity in the relation of 
E(Y) and x is incorrect. It is frequently safer to select a few points (as few as one 
or two) somewhere near the middle of the experimental region to detect curvature 
if it should be present (see Figure 12.2). A further comment is in order. One of the 
assumptions that we have made regarding the simple linear regression model is that 
the variance of the error term & does not depend on the value of the independent 
variable x. If the x values are more spread out, the validity of this assumption may 
become more questionable. 

To summarize, we have given good designs (allocation of experimental units per 
population and selection of settings for the independent variable x) for comparing 
a pair of means and fitting a straight line. These two simple designs illustrate how 
information in an experiment can be increased or decreased, depending on where 
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observations are made and on the allocation of sample sizes. In the next section, we 
consider a method for controlling the amount of inherent variability in an experiment. 


Exercises 


Suppose that you wish to compare the means for two populations and that o? — 9, o2 — 25, 
and n — 90. What allocation of n — 90 to the two samples will result in the maximum amount 
of information about (ш — u2)? 


Refer to Exercise 12.1. Suppose that you allocate n, = п» observations to each sample. How 
large must n, and n» be in order to obtain the same amount of information as that implied by 
the solution to Exercise 12.1? 


Suppose, as in Exercise 12.1, that two populations have respective variances o? — 9 and 
оў = 25. Find the smallest sample size and the corresponding sample allocation that will yield 
a 95% confidence interval for шу — и» that is 2 units in length. 


Refer to Exercise 12.3. How many observations are needed for a 9596 confidence interval to 
be 2 units in length if n, = nz? 


Suppose that we wish to study the effect of the stimulant digitalis on the blood pressure Y of 
rats over a dosage range of x — 2 to x — 5 units. The response is expected to be linear over 
the region; that is, Y = Во + Вх + €. Six rats are available for the experiment, and each rat 
can receive only one dose. What dosages of digitalis should be employed in the experiment, 
and how many rats should be run at each dosage to maximize the quantity of information in 
the experiment relative to the slope £,? 


Refer to Exercise 12.5. Consider two methods for selecting the dosages. Method 1 assigns 
three rats to the dosage x — 2 and three rats to x — 5. Method 2 equally spaces the dosages 
between x = 2 and x = 5 (x = 2, 2.6, 3.2, 3.8, 4.4, and 5.0). Suppose that o is known and that 
the relationship between E(Y) and x is truly linear (see Chapter 11). If we use the data from 
both methods to construct confidence intervals for the slope 6,, which method will yield the 
longer interval? How much longer is the longer interval? If we use method 2, approximately 
how many observations will be required to obtain an interval the same length as that obtained 
by the optimal assignment of method 1? 


Refer to Exercise 12.5. Why might it be advisable to assign one or two points at x — 3.5? 


The standard error of the estimator B 1 in a simple linear regression model gets smaller as S, 
increases, that is, as the x-values become more spread out. Why don't we always spread the 
x-values out as much as possible? 


The Matched-Pairs Experiment 


In Chapters 8 and 10, we considered methods for comparing the means of two popula- 
tions based on independent samples from each. In the previous section, we examined 
how to determine the sizes of the samples from the two populations so that the stan- 
dard error of the estimator Yı — Y? is minimized. In many experiments, the samples 
are paired rather than independent. A commonly occurring situation is one where 
repeated observations are made on the same sampling unit, such as weighing the 
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same individual before and after he or she participated in a weight-loss program. In a 
medical experiment, we might pair individuals who are of the same gender and have 
of similar weights and ages. One individual from each pair is randomly selected to 
receive one of two competing medications to control hypertension whereas the other 
individual from the same pair receives the other medication. 

Comparing two populations on the basis of paired data can be a very effective 
experimental design that can control for extraneous sources of variability and result 
in decreasing the standard error of the estimator Yı — Y for the difference in the 


population means ш — u2. Let (Yj;, Y2;), fori = 1, 2, ..., n, denote a random sample 
of paired observations. Assume that 
EM) = ш, — Va(Yu)- oj —— E(Ya)- m, 


Var(¥2;) = оў, Cov(Yi;, Yai) = роо», 


where p is the common correlation coefficient of the variables within each pair (see 
Section 5.7). Define D; = Yi; — Y2;, fori = 1,2,...,n, the differences between 
the observations within each pair. Because the pairs of observations were assumed 
to be independent and identically distributed, the D;-values, fori = 1, 2,..., п, are 
independent and identically distributed; using Theorem 5.12, we see that 


ир = E(Di) = E(Yi) — Е(Ү») = ш — ио, 
op = Var(D;) = Var(Yi;) + Var(¥2;) — 2Cov(Yy;, Үз») 
= UE + 02 — 200102. 


From these considerations, a natural estimator for шу — иә is the average of the 
differences D = Y, — Y3, and 
E(D) = up = ш — ua, 
2 mo97b lpo 2 
ор = Var(D) = 2 =- [02 + o2 — 2poio;]. 


If the data had been obtained from an independent samples experiment and n, = 
пә = п, 


EY — Y3) = ш — ш, 


1 
2 _ 2 2 
OF т) = «lei + o; ]. 
If it is reasonable to believe that within the pairs (Yj;, Y2;), for i = 1,2,...,n, 


the values of Yı; and Y2; will tend to increase or decrease together (о > 0), then 
an examination of the preceding expressions for ož in the matched-pairs experiment 
7-7.) in the independent samples experiment shows that the matched-pairs 
experiment provides an estimator with smaller variance than does the independent 
samples experiment. In Exercise 12.11, you are asked to decide when the two experi- 
ments will yield estimators with the same variance and when the independent samples 
experiment will give the estimator with the smaller variance. 
Because pairing samples makes the observations within each pair dependent, we 
cannot use the methods that were previously developed to compare populations based 
on independent samples from each. The analysis of a matched-pairs experiment uses 
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the n paired differences, D;, fori = 1,2, ..., n. Inferences regarding the differences 
in the means ш — u2 are made by making inferences regarding the mean of the 
differences, yp. Define 


and employ the appropriate one-sample procedure to complete the inference. If the 
number of pairs, and hence the number of differences, is large—say, n > 30—the 
large-sample inferential methods developed in Chapters 8 and 10 can be used. If 
the number of differences n is small and it is reasonable to assume that the differences 
are approximately normally distributed, we can use inferential methods based on the 
t distribution. We illustrate with the following example. 


EXAMPLE 12.2 


Solution 


We wish to compare two methods for determining the percentage of iron ore in 
ore samples. Because inherent differences in the ore samples would be likely to 
contribute unwanted variability in the measurements that we observe, a matched- 
pairs experiment was created by splitting each of 12 ore samples into two parts. 
One-half of each sample was randomly selected and subjected to method 1; the other 
half was subjected to method 2. The results are presented in Table 12.1. Do the data 
provide sufficient evidence that method 2 yields a higher average percentage than 
method 1? Test using o = .05. 


We have formed the differences in Table 12.1 by taking the method 1 measurement 
and subtracting the corresponding method 2 measurement. If the mean percentage 
for method 2 is larger, then ир = ш — u2 < 0. Thus, we test 


Но: ир = 0 versus Ay: ир <0. 


Table 12.1 Data for the matched-pairs experiment іп Example 12.2 


Ore Sample Method 1 Method 2 d; 
1 38.25 38.27 —.02 
2 31.68 31.71 —.03 
3 26.24 26.22 +.02 
4 41.29 41.33 —.04 
5 44.81 44.80 +.01 
6 46.37 46.39 —.02 
7 35.42 35.46 —.04 
8 38.41 38.39 +.02 
9 42.68 42.72 —.04 

10 46.71 46.76 —.05 
11 29.20 29.18 +.02 
12 30.76 30.79 —.03 


d = —.0167 
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For these data, 


2 
n 1 n 
2 1 
2 d- = » a) 0112 — —(—.20)? 
s2 = Eel i=l = 12 


pe = 

п—1 11 

If it is reasonable to assume that the differences are normally distributed, it follows 
that 


= .0007. 


Е d—0 _ =0067 _ 
"o sp//n  4/0007/ 12 . 
is the observed value of a statistic that under the null hypothesis has a f distribution 
with п — 1 = 11 degrees of freedom (df). Using Table 5, Appendix 3, with a = .05, 
we reject Ho if t « —1.796. Hence, we conclude that sufficient evidence exists to 
permit us to conclude that method 2 yields a higher average percentage than does 
method 1. Again, using Table 5, Appendix 3, it follows that .025 < p-value < .05. 
The applet Student's t Probabilities and Quantiles gives the exact p-value = P(t < 
—2.1865) = P(t > 2.1856) = .02564. 0 


2.1865 


Although the results in Example 12.2 imply that the results of the experiment 
are statistically significant, we can assess the practical significance of the result by 
forming a confidence interval for up. If itis reasonable to assume that the differences 
within each pair are approximately normally distributed, a 100(1 — a)% confidence 
interval for ир = ш — [2 is given by 


D+ ta J2 (2) P 


where 15/? is based on n — | df (recall that n is the number of pairs of observations). 


EXAMPLE 12.3 


Solution 


Use the data from Example 12.2 to form a 9596 confidence interval for the difference 
in mean percentage readings using methods 1 and 2. 


From Example 12.2, we observe that 
а = —.0167, 525.000, | n-1z1l. 
Because, with 11 df, to.925 = 2.201, the desired interval is 
/.0007 


—.0167 + (2.201) | 
М12 


or (—.0335, +.0001). | 


The preceding methods based on the f distribution can be validly employed if it 
is reasonable to assume that the differences are normally distributed. When we com- 
pared two population means based on small independent samples, we required that 
the population variances be equal. The validity of the matched-pair analysis does not 
require the assumption of equal population variances. The quantity S provides an 
unbiased estimator for the variance of the differences, 85, regardless of the values of 
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07, oF, and р. The independent samples ¢ test also required that both samples were 
taken from normally distributed populations. One way that the differences within pairs 
can be normally distributed is if Y;;, fori = 1,2, ..., n, and Yo, fori = 1,2,...,n, 
are themselves normally distributed. However, it is possible that the pairwise differ- 
ences will be normally distributed even if ће Y;’s and Y2’s are not. Exercise 12.17 
presents an example of such a situation. Thus, the assumption that the differences be 
normally distributed is less restrictive than the assumption that both populations are 
normally distributed. 

We have seen that the matched-pairs experiment can be used to decrease the inher- 
ent variability present in the data. Further, in many situations, the assumptions required 
to validly employ a matched-pairs analysis are less restrictive than the corresponding 
independent samples methods. Why do statistical analysts encounter matched-pairs 
data? Sometimes the matched-pairs experiment was performed by design, taking into 
account the considerations previously discussed. Other times, data were obtained via 
the matched-pair experiment because of convenience. Whatever the reason for con- 
ducting a matched-pairs experiment, the resulting data should not be analyzed using 
a method appropriate for data obtained using independent samples. 

Recall that the data from a matched-pairs experiment are analyzed by focusing on 
the differences of the observations within each pair. Thus, some statisticians prefer to 
refer to the matched-pairs experiment as a paired-difference experiment. In the next 
section, we present some common terminology associated with experimental designs 
and consider extensions of the independent samples experiment and the matched-pairs 
experiment. 


Exercises 


Consider the data analyzed in Examples 12.2 and 12.3. 


a Assuming that both the methods used to analyze the samples worked reasonably well, why 
do you think that the observations on the two halves of each ore sample will be positively 
correlated? 

b Do you think that we should have taken independent observations using the two methods, 
or should we have conducted the paired analysis contained in the text? Why? 


Two computers often are compared by running a collection of various “benchmark” programs 
and recording the difference in CPU time required to complete the same program. Six bench- 
mark programs, run on two computers, produced the following table of CPU times (in minutes). 


Benchmark Program 


Computer 1 2 3 4 5 6 
1 1.12 1.73 1.04 1.86 1.47 2.10 
2 1.15 1.72 1.10 1.87 1.46 2.15 


а Do the data provide sufficient evidence to indicate a difference in mean CPU times required 
for the two computers to complete a job? Test using о = .05. 


Give bounds for the associated p-value. 


Find a 95% confidence interval for the difference in mean CPU time required for the two 
computers to complete a job. 
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When Y;;, fori = 1,2,..., n, and Yoj, fori = 1, 2,..., п, represent independent samples from 

two populations with means и and ил and variances o? and o7, respectively, we determined 

that oF туе (1/n)(o? + o2). If the samples were paired and we computed the differences, 
1—22 


D;, fori = 1,2, ..., n, we determined that o (2/ D) = (1/n)(o? + 02 — 2po10?). 
When is oF, _y,) greater than o (2/ D)? 


Wheniso?. . equal to o (2/ D)? 


(= 2 


on 
When is от) less than с (2/0)? 


Based on the discussion in the text and your answers to parts (а) (с), when would it be better 
to implement the matched-pairs experiment and when would it be better to implement the 
independent samples experiment? 


ana c» 


Refer to Exercise 12.11. Assume that o? = 02 = о2. The table values used to implement a 
test of hypothesis or construct a confidence interval depend, for small samples, on the number 
of degrees of freedom associated with the estimates for o? or o2. 


a Assuming two independent samples, each of size n, and that o? = o? = 07, how many 


degrees of freedom are associated with the estimator for the common variance o°? 


b Assuming a matched-pairs experiment consisting of n pairs of observations, how many 
degrees of freedom are associated with the estimator of o2? 


с Assume that all of the assumptions necessary to implement the independent samples t 
procedures are satisfied and that we want to find a 95% confidence interval for the difference 
in means. What are the values of t o25 used to construct confidence intervals for the difference 
in means based on the independent samples and matched-pairs experiments if n — 5? 
If n = 10? If n = 30? 

d If all of the assumptions necessary to implement the independent samples ¢ procedures 
are satisfied, identify a possible disadvantage to implementing a matched-pairs experiment 
rather than taking independent samples. 


Exercise 10.76 describes a dental experiment conducted to investigate the effectiveness of an 
oral rinse used to inhibit the growth of plaque on teeth. Subjects were divided into two groups: 
One group used a rinse containing the antiplaque agent, and the control group used a rinse 
with only inactive ingredients. Another experiment has been performed to assess the growth of 
plaque for individuals who have used the rinse with the antiplaque agent. For each person in the 
study, plaque buildup was measured 4 hours after using the rinse and again after 8 hours. If you 
wanted to compare the mean plaque buildup for the two different times, would you implement 
an analysis based on a matched-pairs or independent samples procedure? Why? 


Two procedures for sintering copper are to be compared by testing each procedure on six 
different types of powder. The measurement of interest is the porosity (volume percentage due 
to voids) of each test specimen. The results of the tests are as shown in the accompanying table. 


Powder Procedure I Procedure II 
1 21 23 
2 27 26 
3 18 21 
4 22 24 
5 26 25 
6 19 16 


Is there sufficient evidence to claim that procedure II produces higher mean porosity values? 
Give bounds for the p-value. What would you conclude at the o = .05 level? 
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12.15 А plant manager, in deciding whether to purchase a machine of design A or design B, checks 
the times for completing a certain task on each machine. Eight technicians were used in the 
experiment, with each technician using both machine A and machine B in a randomized order. 
The times (in seconds) required to complete the task are given in the accompanying table. 


Technician A B 


1 32 30 
2 40 39 
3 42 42 
+ 26 23 
5 35 36 
6 29 27 
7 45 41 
8 22 21 


a Test to see if there is a significant difference between mean completion times, at the 5% 
significance level. 


Do you think pairing on technicians was worthwhile in this case? Explain. 


What assumptions are necessary for the test in part (a)? 


12.16 “Моск” is the rich, highly organic type of soil that serves as the primary growth medium for 
vegetation in the Florida Everglades. Because of the high concentration of organic material, 
muck can be destroyed over time by a variety of natural and human-made causes. Members of 
the Florida Game and Fresh Water Fish Commission staked out several plots in the Everglades. 
The depth of muck at each location was measured when each plot was marked and again 
6 years later. The following table identifies a portion of the data (given in inches) obtained. 


Plot Initial Reading Later Reading Plot Initial Reading Later Reading 

1 34.5 31.5 9 44.0 35.2 
2 44.0 37.9 10 40.5 37.2 
3 34.5 35.5 11 27.0 24.7 
4 27.0 23.0 12 29.5 25.8 
5 37.0 34.5 13 31.5 29.0 
6 40.0 31] 14 35.0 36.8 
7 47.2 46.0 15 44.0 36.5 
8 35.2 31.0 


a Test to see if there is sufficient evidence to indicate a decrease in average muck depth during 
the study period. Give bounds on the associated p-value. What would you conclude if you 
desired to implement an a = .01 level test? (Although you are free to take the necessary 
differences in any order that you prefer, the answer provided at the back of the book assumes 
that the differences were formed by taking later readings minus initial readings.) 

b Give a 95% confidence interval for the difference in mean muck depths at the end and 
beginning of the study. Interpret this interval. [See the remark following part (a).] 

c Give a 95% confidence interval for the initial mean muck depth in the portion of the 
Everglades in which the study was conducted. 


d Repeat the instructions of part (c) for later readings. 


What assumptions are necessary to apply the techniques you used in answering parts (a) 
and (b)? Parts (c) and (d)? 
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Refer to the matched-pairs experiment and assume that the ith measurement (i = 1, 2), in the 
jth pair, where j = 1, 2,...,n, is 


Yij = ш + Uj + 5, 


where u; = expected response for population i, where i = 1, 2, 


= 
] 


а random variable that is uniformly distributed оп the interval (—1, +1), 


€j = random error associated with the ith measurement in the jth pair. 


Assume that the &;;'s are independent normal random variables with E(¢;;) = 0 and 
V (eij) = o°, and that U; and &;; are independent. 


a Find E(Y;). 

b Argue that the Y, j$. for j = 1,2,...,n, аге not normally distributed. (There is no need 
to actually find the distribution of the Y,-values.) 

€ Show that Cov(Yi;, Y2;) = 1/3, for j = 1,2,...,n. 
Show that D; = Yı; — Y»; are independent, normally distributed random variables. 
In parts (a)- (d), you verified that the differences within each pair can be normally distributed 
even though the individual measurements within the pairs are not. Can you come up with 
another example that illustrates this same phenomenon? 


Some Elementary Experimental Designs 


In Chapters 8 and 10, we considered methods to compare the means of two pop- 
ulations based on independent random samples obtained from each. Section 12.3 
dealt with a comparison of two population means through the matched-pairs exper- 
iment. In this section, we present general considerations associated with designing 
experiments. Specifically, we consider extensions of the independent samples and 
matched-pairs methodologies when the objective is to compare the means of more 
than two populations. 

Suppose that we wish to compare five teaching techniques, A, B, C, D, and E, and 
that we use 125 students in the study. The objective is to compare the mean scores 
on a standardized test for students taught by each of the five methods. How would 
we proceed? Even though the 125 students are in some sense representative of the 
students that these teaching methods target, are the students all identical? The answer 
is obviously no. 

There are likely to be boys and girls in the group, and the methods might not be 
equally effective for both genders. There are likely to be differences in the native 
abilities of the students in the group, resulting in some students performing better 
regardless of the teaching method used. Different students may come from families 
that place different emphases on education, and this could have an impact on the 
scores on the standardized test. In addition, there may be other differences among the 
125 students that would have an unanticipated effect on the test scores. 

Based on these considerations, we decide that it might be wise to randomly as- 
sign 25 students to each of five groups. Each group will be taught using one of the 
techniques under study. The random division of the students into the five groups 
achieves two objectives. First, we eliminate the possible biasing effect of individual 
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characteristics of the students on the measurements that we make. Second, it provides 
a probabilistic basis for the selection of the sample that permits the statistician to 
calculate probabilities associated with the observations in the sample and to use these 
probabilities in making inferences. 

The preceding experiment illustrates the basic components of a designed experi- 
ment. The experimental units in this study are the individual students. 


Experimental units are the objects upon which measurements are taken. 


This experiment involves a single factor—namely, method of teaching. In this 
experiment, the factor has five levels: A, B, C, D, and E. 


Factors are variables completely controlled by the experimenter. The intensity 
level (distinct subcategory) of a factor is called its level. 


In a single-factor experiment like the preceding one, each level of the single factor 
represents a treatment. Thus, in our education example, there are five treatments, 
one corresponding to each of the teaching methods. As another example, consider 
an experiment conducted to investigate the effect of various amounts of nitrogen and 
phosphate on the yield of a variety of corn. An experimental unit would be a specified 
acreage—say, 1 acre—of corn. A treatment would be a fixed number of pounds of 
nitrogen xı and of phosphate x» applied to a given acre of corn. For example, опе 
treatment might be to use x, = 100 pounds of nitrogen per acre and x? = 200 pounds 
of phosphate. A second treatment might correspond to x, = 150 and x2 = 100. Notice 
that the experimenter could use different amounts (x1, x2) of nitrogen and phosphate 
and that each combination would represent a different treatment. 


A treatment is a specific combination of factor levels. 


The preceding experiment for comparing teaching methods A, B, C, D, and E 
entailed randomly dividing the 125 students into five groups, each of size 25. Each 
group received exactly one of the treatments. This is an example of a completely 
randomized design. 


A completely randomized design to compare k treatments is one in which a 
group of n relatively homogeneous experimental units are randomly divided 
into k subgroups of sizes пу, по, ..., ny (where nı + no 4--:- + ny = n). 
АП experimental units in each subgroup receive the same treatment, with each 
treatment applied to exactly one subgroup. 
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Associated with each treatment is a population (often conceptual) consisting of 
all observations that would have resulted if the treatment were repeatedly applied. In 
the teaching example, we could envision a population of all possible test scores if 
all students were taught using method A. Corresponding conceptual populations are 
associated with each of the other teaching methods. Thus, each treatment has a corre- 
sponding population of measurements. The observations obtained from a completely 
randomized design are typically viewed as being independent random samples taken 
from the populations corresponding to each of the treatments. 

Suppose that we wish to compare five brands of aspirin, A, B, C, D, and E, re- 
garding the mean amount of active ingredient per tablet for each of the brands. We 
decide to select 100 tablets randomly from the production of each manufacturer and 
use the results to implement the comparison. In this case, we physically sampled 
five distinct populations. Although we did not “apply” the different treatments to 
a homogeneous batch of blank tablets, it is common to refer to this experiment as 
involving a single factor (manufacturer) and five treatments (corresponding to the 
different manufacturers). Thus, in this example, for each population, we identify a 
corresponding treatment. Regardless of whether we have implemented a completely 
randomized design or taken independent samples from each of several existing 
populations, a one-to-one correspondence is established between the populations and 
the treatments. Both of these scenarios, in which independent samples are taken from 
each of k populations, are examples of a one-way layout. 


A one-way layout to compare k populations is an arrangement in which inde- 
pendent random samples are obtained from each of the populations of interest. 


Thus, a one-way layout, whether it corresponds to data obtained by using a com- 
pletely randomized design or by taking independent samples from each of several 
existing populations, is the extension of the independent samples experiments that 
we considered in Chapters 8 and 10. Methods of analyzing data obtained from a 
one-way layout are presented in Sections 13.3-13.7. 

In Section 12.3, we saw that a matched-pairs design often yields a superior method 
for comparing the means of two populations or treatments. When we were interested in 
comparing the effectiveness of two drugs for controlling hypertension, we suggested 
forming matched pairs of individuals who were of the same sex and of similar age and 
weight. One randomly selected member of each pair received treatment 1 whereas 
the other received treatment 2. The objective was to control for extraneous sources 
of variability and thus to obtain a more precise analysis. Suppose that we wanted to 
compare three different medications instead of two. How would we proceed? Instead 
of forming several pairs of matched individuals, we could form several groups, each 
containing three members matched on sex, weight, and age. Within each group of 
three, we would randomly select one individual to receive treatment 1 and another 
to receive treatment 2, and then we would administer treatment 3 to the remaining 
member of each group. The objective of this design is identical to that of the matched- 
pairs design—namely, to eliminate unwanted sources of variability that might creep 
into the observations in our experiment. This extension of the matched-pairs design 
is called a randomized block design. 
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FIGURE 12.3 
A randomized 
block design 
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A randomized block design containing b blocks and k treatments consists of b 
blocks of k experimental units each. The treatments are randomly assigned to the 
units in each block, with each treatment appearing exactly once in every block. 


The difference between a randomized block design and the completely randomized 
design can be demonstrated by considering an experiment designed to compare subject 
reaction to a set of four stimuli (treatments) in a stimulus-response psychological 
experiment. We will denote the treatments as Tj, 7, T3, and T4. 

Suppose that eight subjects are to be randomly assigned to each of the four treat- 
ments. Random assignment of subjects to treatments (or vice versa) randomly dis- 
tributes errors due to person-to-person variability in response to the four treatments 
and yields four samples that, for all practical purposes, are random and independent. 
This is a completely randomized experimental design. 

The experimental error associated with a completely randomized design has a 
number of components. Some of these are due to the differences between subjects, 
to the failure of repeated measurements within a subject to be identical (due to the 
variations in physical and psychological conditions), to the failure of the experimenter 
to administer a given stimulus with exactly the same intensity in repeated measure- 
ments, and to errors of measurement. Reduction of any of these causes of error will 
increase the information in the experiment. 

The subject-to-subject variation in the foregoing experiment can be eliminated by 
using subjects as blocks. Each subject would receive each of the four treatments as- 
signed in a random sequence. The resulting randomized block design would appear as 
in Figure 12.3. Now only eight subjects are needed to obtain eight response measure- 
ments per treatment. Notice that each treatment occurs exactly once in each block. 

The word randomized in the name of the design implies that the treatments are 
randomly assigned within a block. For our experiment, position in the block refers 
to the position in the sequence of stimuli assigned to a given subject over time. The 
purpose of the randomization (that is, position in the block) is to eliminate bias caused 
by fatigue or learning. 

Blocks may represent time, location, or experimental material. If three treatments 
are to be compared and there is a suspected trend in the mean response over time, 
a substantial part of the time-trend variation may be removed by blocking. АП three 
treatments would be randomly applied to experimental units in one small block of 
time. This procedure would be repeated in succeeding blocks of time until the required 


Subjects 


FIGURE 12.4 
A Latin square 
design 
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amount of data is collected. A comparison of the sale of competitive products in 
supermarkets should be made within supermarkets, thus using the supermarkets as 
blocks and removing store-to-store variability. Animal experiments in agriculture 
and medicine often use animal litters as blocks, applying all the treatments, one 
each, to animals within a litter. Because of heredity, animals within a litter are more 
homogeneous than those between litters. This type of blocking removes litter-to-litter 
variation. The analysis of data generated by a randomized block design is discussed 
in Sections 13.8-13.10. 

The randomized block design is only one of many types of block designs. Blocking 
in two directions can be accomplished by using a Latin square design. Suppose that 
the subjects of the preceding example became fatigued as the stimuli were applied, 
so the last stimulus always produced a lower response than the first. If this trend (and 
consequent lack of homogeneity of the experimental units within a block) were true 
for all subjects, a Latin square design would be appropriate. The design would be 
constructed as shown in Figure 12.4. Each stimulus is applied once to each subject 
and occurs exactly once in each position of the order of presentation. АП four stimuli 
occur in each row and in each column of the 4 x 4 configuration. The resulting design 
is a 4 x 4 Latin square. А Latin square design for three treatments requires a 3 x 3 
configuration; in general, p treatments require a p x p array of experimental units. 
If more observations are desired per treatment, the experimenter should use several 
Latin square configurations in one experiment. In the preceding example, it would 
be necessary to run two Latin squares to obtain eight observations per treatment. The 
experiment would then contain the same number of observations per treatment as the 
randomized block design (Figure 12.3). 

A comparison of means for any pair of stimuli would eliminate the effect of subject- 
to-subject variation, but it would also eliminate the effect of the fatigue trend within 
each stimulus because each treatment was applied in each position of the stimuli-time 
administering sequence. Consequently, the effect of the trend would be canceled in 
comparing the means. A more extensive discussion of block designs and their analyses 
is contained in the texts listed in the references at the end of the chapter. 

The objective of this section has been to present some of the basic considerations 
in designing experiments. We have discussed the role of randomization in all well- 
designed experiments and have focused on extensions of the independent samples 
and matched-pairs experiments to situations in which we wish to compare more than 
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Presentation of 
Stimuli 
(Rows) 3 


656 Chapter 12 


12.18 


12.19 


12.20 


12.21 
12.22 
12.23 
12.24 


12.25 


12.26 


12.27 


Considerations in Designing Experiments 


two treatments. Particularly, we pointed out the existence of block designs, how they 
work, and how they can produce substantial increases in the quantity of information 
obtained from an experiment by reducing nuisance variation. 


Exercises 


Two drugs, A and B, are to be applied to five rats each. Suppose that the rats are numbered 
from 1 to 10. Use the random number table to assign the rats randomly to the two treatments. 


Refer to Exercise 12.18. Suppose that the experiment involved three drugs, A, B, and C, with 5 
rats assigned to each. Use the random number table to assign the 15 rats randomly to the three 
treatments. 


A chemical engineer has two catalysts and three temperature settings that she wishes to use in 
a series of experiments. 


a How many treatments (factor-level combinations) are there in this experiment? Carefully 
describe one of these treatments. 


b Each experiment makes use of one catalyst-temperature combination. Show how you 
would use a random number table to randomize the order of the experiments. 


Give two reasons for utilizing randomization in an experiment. 
What is a factor? 
What is a treatment? 


Could a variable be a factor in one experiment and a nuisance variable (source of extraneous 
variation) in another? 


If you were to design an experiment, what part of the design procedure would increase the 
accuracy of the experiment? What part of the design procedure would decrease the impact of 
extraneous sources of variability? 


An experiment is to be conducted to compare the effect of digitalis on the contraction of the heart 
muscles of rats. The experiment is conducted by removing the heart from a live rat, slicing the 
heart into thin layers, and treating the layers with dosages of digitalis. The muscle contraction 
is then measured. If four dosages, A, B, C, and D, are to be employed, what advantage might 
be derived by applying A, B, C, and D to a slice of tissue from the heart of each rat? What 
principle of design is illustrated by this example? 


Complete the assignment of treatments for the following 3 x 3 Latin square design. 
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12.5 Summary 


The objective of this chapter has been to identify the factors that affect the quantity of 
information in an experiment and to use this knowledge to design better experiments. 
The design of experiments is a very broad subject and certainly one not susceptible to 
condensation into a single chapter in an introductory text. However, the philosophy 
underlying design, some methods for varying information in an experiment, and some 
desirable strategies for design are easily explained. 

We have seen that the amount of information pertinent to a parameter of interest 
depends on the selection of factor-level combinations (treatments) to be included 
in the experiment and on the allocation of the total number of experimental units to the 
treatments. Randomization is an important component of any designed experiment. 
The use of randomization helps eliminate biases in experimental results and provides 
the theoretical basis for computing the probabilities that are key to the inference- 
making process. Blocking—comparing treatments within relatively homogeneous 
blocks of experimental material—can be used to eliminate block-to-block variation 
when comparing treatments. As such, it serves as a filter to reduce the effect of 
unwanted sources of variability. 

The analysis of some elementary experimental designs is given in Chapter 13. A 
more extensive treatment of the design and analysis of experiments is a course in 
itself. If you are interested in exploring this subject, consult the texts listed in the 
references that follow. 
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Supplementary Exercises 


12.28 How can one measure the information in a sample pertinent to a specific population parameter? 


12.29  Whatis a random sample? 
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What factors affect the quantity of information in an experiment? What design procedures 
control these factors? 


Refer to the matched-pairs experiment of Section 12.3 and assume that the measurement 
receiving treatment і, where і = 1, 2, in the jth pair, where j = 1, 2,...,и, 15 


Yij = ui + Pj + éij, 


where ш; = expected response for treatment i, for i = 1, 2, 
Р; = additive random effect (positive or negative) contribution by the jth 
pair of experimental units, for j = 1, 2,...,n, 


€;; = random error associated with the experimental unit in the jth pair that 
receives treatment і. 


Assume that the ¢;;’s are independent normal random variables with E(¢;;) = 0, V (€;;) = о?; 


and assume that ће P;'s аге independent normal random variables with Е(Р;) = 0, V(P;) = 
d Also, assume that the P;'s and ¢;;’s are independent. 


a Find E(Y;;). 
b Find E(Y;) and V(Y;), where Y; is the mean of the n observations receiving treatment i, 
where i — 1,2. 


с Let D = Y, — Y;. Find E(D), V(D), and the probability distribution for D. 


Refer to Exercise 12.31. Prove that 


Dn 
Sp 


possesses a t distribution, under Ho: (ш — ро) = 0. 


Refer to Exercise 12.31. Suppose that a completely randomized design is employed for 
the comparison of the two treatment means. Then, a response could be modeled by the 
expression 


Yi; = ш + Pij + &ij, 


but the “pair effect" P;; (which will still affect an experimental unit) will be randomly se- 
lected and will likely differ from one of the 2n observations to another. Further, in contrast to 
the matched-pairs experiment, the pair effects will not cancel when you calculate (Y, — Y3). 
Compare V (Y; -Y5) = V (D) for this design with the matched-pairs design of Exercise 12.31. 
Why is the variance for the completely randomized design usually larger?! 


Persons submitting computing jobs to a computer center usually are required to estimate the 
amount of computer time required to complete the job. This time is measured in CPUs, the 
amount of time that a job will occupy a portion of the computer's central processing unit's 
memory. А computer center decided to perform a comparison of the estimated versus actual 
CPU times for a particular customer. The corresponding times were available for 11 jobs. The 
sample data are given in the accompanying table. 


1. Exercises preceded by an asterisk are optional. 
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Job Number 


CPU Time 
(minutes) 1 2 3 4 Э 6 7 8 9 10 11 


Estimated 50 140 95 45 75 1.20 160 2.6 1.30 .85 60 
Actual 46 152 99 53 .71 131 149 29 141 .83 .74 


a Why would you expect that the observations within each of these pairs of data to be 
correlated? 

b Ро the data provide sufficient evidence to indicate that, on the average, the customer tends 
to underestimate the CPU time required for computing jobs? Test using o = .10. 


с Find the observed significance level for the test and interpret its value. 


d Finda9096 confidence interval for the difference in mean estimated CPU time versus mean 
actual CPU time. 


The earth's temperature affects seed germination, crop survival in inclement weather, and many 
other aspects of agricultural production. Temperature at various locations can be measured 
using ground-based sensors or infrared-sensing devices mounted on aircraft or space satellites. 
Ground-based sensoring is tedious and requires many replications to obtain accurate estimates 
of ground temperature. On the other hand, airplane- or satellite-mounted sensors appear to 
introduce a bias in temperature readings. To estimate the amount of bias, both methods were 
used to measure ground temperature at five locations. The readings, measured in degrees 
Celsius, are given in the following table. 


Temperature (°C) 


Location Ground Air 
1 46.9 47.3 
2 45.4 48.1 
3 36.3 37.9 
4 31.0 32.7 
5 24.7 26.2 


a Dothe data present sufficient evidence to claim a difference in average ground-temperature 
readings using ground- and air-based sensors? 


b Construct a 9596 confidence interval for the difference in mean ground-temperature read- 
ings using ground- and air-based sensors. 

c We want to estimate the difference between mean temperature readings for ground- and 
air-based sensors to within .2°C at the 95% confidence level. Approximately how many 
paired observations (measurements at different locations) are required? 


An experiment was conducted to compare mean reaction time to two types of traffic signs: 
prohibitive (no left turn) and permissive (left turn only). Ten subjects were included in the 
experiment. Each subject was presented 40 traffic signs, 20 prohibitive and 20 permissive, in 
random order. The mean time to reaction and the number of correct actions were recorded for 
each subject. The mean reaction times to the 20 prohibitive and 20 permissive traffic signs for 
each of the ten subjects are reproduced in the following table. 
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Mean Reaction Times (ms) 
for 20 Traffic Signs 


Subject Prohibitive Permissive 
1 824 702 
2 866 725 
3 841 744 
4 770 663 
5 829 792 
6 764 708 
7 857 747 
8 831 685 
9 846 742 

10 759 610 


a Explain why this is a matched-pairs experiment and give reasons why the pairing should 
be useful in increasing information on the difference between the mean reaction times to 
prohibitive and permissive traffic signs. 

b Do the data present sufficient evidence to indicate a difference in mean reaction times to 
prohibitive and permissive traffic signs? Test using о = .05. 


[e] 


Find and interpret the approximate p-value for the test in part (b). 


d Find a 95% confidence interval for the difference in mean reaction times to prohibitive and 
permissive traffic signs. 
*12.37 Suppose that you wish to fit the model 
Y = Bot fix + Bax? +e 


to a set of n data points. If the л points are to be allocated at the design points x = — 1, 0, and 
1, what fraction should be assigned to each value of x so as to minimize V (B5)? (Assume that 
n is large and that kı, k2, and ks, kı + Ко + k3 = 1, are the fractions of the total number of 
observations to be assigned at x = — 1, 0, and 1, respectively.) 
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The Analysis of Variance 


13.1 Introduction 
13.2 The Analysis of Variance Procedure 


13.3 Comparison of More Than Two Means: Analysis of Variance 
for a One-Way Layout 


13.4 An Analysis of Variance Table for a One-Way Layout 
13.5 А Statistical Model for the One-Way Layout 


13.6 Proof of Additivity of the Sums of Squares and E(MST) for a One-Way 
Layout (Optional) 


13.7 Estimation in the One-Way Layout 

13.8 А Statistical Model for the Randomized Block Design 

13.9 The Analysis of Variance for a Randomized Block Design 

13.10 Estimation in the Randomized Block Design 

13.11 Selecting the Sample Size 

13.12 Simultaneous Confidence Intervals for More Than One Parameter 
13.13 Analysis of Variance Using Linear Models 

13.14 Summary 


References and Further Readings 


Introduction 


Most experiments involve a study of the effect of one or more independent variables 
on aresponse. Independent variables that can be controlled in an experiment are called 


factors, and the intensity level of a factor is called its Zevel. 


The analysis of data generated by a multivariable experiment requires identifica- 
tion of the independent variables in the experiment. These will not only be factors 
(controlled independent variables) but could also be directions of blocking. If one 


661 


662 Chapter 13 


13.2 


The Analysis of Variance 


studies wear measurements for three types of tires, A, B, and C, on each of four au- 
tomobiles, “tire types" is a factor representing a single qualitative variable (there is 
no quantitative or numerical value associated with the variable “tire type") with three 
levels. Automobiles are blocks and represent a single qualitative variable with four 
levels. Responses for a Latin square design depend on the factors that represent treat- 
ments but are also affected by two qualitative independent block variables, “rows” 
and “columns.” 

Methods for designing experiments to increase accuracy and to control for extra- 
neous sources of variation were discussed in Chapter 12. In particular, the one-way 
layout and the randomized block design were shown to be generalizations of simple 
designs for the independent samples and matched-pairs comparisons of means that 
were discussed in Chapters 8, 10, and 12. Treatments correspond to combinations 
of factor levels and identify the different populations of interest to the experimenter. 
This chapter presents an introduction to the analysis of variance and gives methods for 
the analysis of the one-way layout (including the completely randomized design) and 
randomized block designs. The analogous methods of analysis for the Latin square 
design are not presented in this chapter, but they can be found in the texts listed in 
the references at the end of the chapter. 


The Analysis of Variance Procedure 


The method of analysis for experiments involving several independent variables can 
be explained by intuitively developing the procedure or, more rigorously, through the 
linear models approach developed in Chapter 11. We begin by presenting an intuitive 
discussion of a procedure known as the analysis of variance (ANOVA). An outline 
of the linear model approach is presented in Section 13.13. 

As the name implies, the ANOVA procedure attempts to analyze the variation 
in a set of responses and assign portions of this variation to each variable in a set 
of independent variables. Because the experimenter rarely, if ever, includes all the 
variables affecting the response in an experiment, random variation in the responses 
is observed even if all independent variables considered by the experimenter are held 
constant. The objective of the ANOVA is to identify important independent variables 
and determine how they affect the response. 

The rationale underlying the ANOVA can best be indicated with a symbolic discus- 
sion. The actual analysis—that is, how to do it—will be illustrated with an example. 

As in Chapter 11, variability of a set of n measurements is quantified by the sum 
of squares of deviations У? у (y; — y)?. The ANOVA procedure partitions this sum 
of squares of deviations, called the total sum of squares, into parts, each of which is 
attributed to one of the independent variables in the experiment, plus a remainder that 
is associated with random error. Figure 13.1 illustrates such a partitioning for three 
independent variables. If a multivariable linear model were written for the response, 
as suggested in Chapter 11, the portion of the total sum of squares assigned to error 
is labeled SSE. 

For the cases that we consider and under the hypothesis that the independent 
variables are unrelated to the response, each of the pieces of the total sum of squares, 


FIGURE 13.1 
Partitioning of the 
total sum of squares 
of deviations 
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Sum of Squares for 
Independent Variable 
No. 1 


Sum of Squares for 
Independent Variable 


X O- No:4 


i=1 
Total Sum of Squares Sum of Squares for 
Independent Variable 
No. 3 


Sum of Squares for 
Error 


divided by an appropriate constant, provides an independent and unbiased estimator 
of c?, the variance of the experimental error. When a variable is highly related to the 
response, its portion of the total sum of squares (called the sum of squares for that 
variable) will be inflated. This condition can be detected by comparing the sum of 
squares for that variable with the sum of squares for error, SSE. The test will be based 
on a Statistic that possesses an F distribution and specifies that the hypothesis of no 
effect for the independent variable should be rejected if the value of F is large. 

The mechanism involved in an ANOVA can best be introduced by considering a 
familiar example. Assume that we wish to use information in independent samples 
of sizes n, = n» to compare the means of two normally distributed populations with 
means ш and и» and equal variances o? = оў = о?. This experiment, formerly 
analyzed using the independent samples f test, will now be approached from another 
point of view. The total variation of the response measurements in the two samples 
is quantified by (recall that n, = n2) 


ni nı 


2 2 
Total SS = 3307 -YP = У УО, ey. 


i=l j=l i=l j=l 


where Y;; denotes the jth observation in the ith sample and Y is the mean of all 
n — 2n, observations. This quantity can be partitioned into two parts, as follows: 


2 n 
Total SS = У У (у; - vy 
і=1 j=l 
nı 


2 2 
m» (i -Yy + УУ О» -Yiy 
ie 


i=l ј=1 
et у, 
SST SSE 


(proof deferred to Section 13.6), where Y; is the average of the observations in the ith 
sample, fori = 1, 2. Let us examine the quantity SSE more closely. Recall that we 


664 


Chapter 13 


The Analysis of Variance 


have assumed that the underlying population variances are equal and that ny = n2. 


ny 


2 2 
SSE = у, 0 -Yyz- n – 1)5? 
1 


i=1 j=l i= 
= (m — 1)S7 + (пу — 1)53, 


where 


ny 


1 — 
5? = У; - Y. 
j=l 


п = 1 < 
Recall that, in ће case n; = п», ће “pooled” estimator for the common variance o? 
is given by 
| (иу = 1)52 + (по – 1)52 (т —1)S7 + (п – 1)S$ SSE 


s? = 
= = a 
п +n2—2 ny tn, = 2 2n; —2 


We have partitioned the total sum of squares of deviations into two parts. One part, 
SSE, can be divided by 2л — 2 to obtain the pooled estimator of o?. Because there 
are only two treatments (or populations) and n; = п», the other part, 


2 
аз эе, z0 -Y». 

the sum of squares for treatments (SST), will be large if |Y; — Y2] is large. Hence, 
the larger SST is, the greater will be the weight of evidence to indicate a difference 
between ш and u2. When will SST be large enough to indicate a significant difference 
between ш and шэ? 

Because we have assumed that Yj; is normally distributed with E(Y;;) = ш, for 
i = 1,2, and Ү(Ү;;) = c? and because SSE/(2n, — 2) is identical to the pooled 
estimator of o? used in Chapters 8 and 10, it follows that 


SSE А 
Е = 0 
2n| —2 


ттт. 
ae 2 p + 2. 2 
j= j= 


and that 


[03 o 


has a x? distribution with 2n; — 2 degrees of freedom (df ) (see Section 8.8). 
In Section 13.6, we will derive a result impling that 


2 ny 2 
E(SST) = o + > (Hi = ua). 
Notice that SST estimates o? if ш = из and a quantity larger than o? if ш % m2. 
Under the hypothesis that ш = u2, it follows that 
Yi -Y> 


/20? [ni 


13.2 The Analysis of Variance Procedure 665 


has a standard normal distribution; hence, 


z2- (9) E x _ SST 
2 о? o? 
has a x? distribution with 1 df. 
Notice that SST is a function of only the sample means Y; and Уз whereas SSE 
is a function of only the sample variances 5 and S . Theorem 7.3 implies that, for 
i = 1, 2, the sample means Y; and sample variances 52 аге independent. Because ће 


samples аге assumed to be independent, it follows that SST and SSE аге independent 
random variables. Hence, from Definition 7.3, under the hypothesis that ш = u2, 


SST А 

o? Е SST/1 
SSE ^ SSE/(2n; — 2 
A en -2) EEUU 


has an F distribution with v; = 1 numerator degree of freedom and v2 = (2n; — 2) 
denominator degrees of freedom. 

Sums of squares divided by their respective degrees of freedom are called mean 
squares. In this case, the mean square for error and the mean square for treatments 
are given by 


E T 
BS and MST = ЗВТ Р 
2n; —2 1 


Under Ну: ш = u2, both MST and MSE estimate o?. However, when Hj is false 
and ш Æ u2, MST estimates something larger than o? and tends to be larger than 
MSE. To test Ho: ш = u2 versus Ha : ш ~ ио, we use 


MST 
Е = = 
MSE 


MSE = 


as the test statistic. 
Disagreement with the null hypothesis is indicated by a large value of Е; hence, 
the rejection region for a test with significance level o is 


Е > Fy. 


Thus, the ANOVA test results in a one-tailed F test. The degrees of freedom for F are 
those associated with MST and MSE. In the present instance, as previously indicated, 
F is based on vy = 1 and v; = 2n, — 2 numerator and denominator degrees of 
freedom, respectively. 

For the two-sample problem under consideration, the F test just described is 
equivalent to the two-tailed t test of Chapter 10. So why bother establishing this 
equivalence? As we will see in Section 13.3, the F test readily generalizes to allow 
comparison of any number of treatments. 


EXAMPLE 13.1 


The coded values for a measure of elasticity in plastic prepared by two different 
processes are given in Table 13.1. Independent samples, both of size 6, were taken 
from the output of each process. Do the data present sufficient evidence to indicate a 
difference in mean elasticity for the two processes? 
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Table 13.1 Data for Example 13.1 


A B 
6.1 9.1 
ТА 8.2 
7.8 8.6 
6.9 6.9 
7.6 7.9 
8.2 7.9 


Solution Although the two-sample f test of Section 10.8 could be used to analyze these data, 
we will use the ANOVA F test discussed earlier in this section. The three desired 
sums of squares are 


2 6 
Total SS = УУ oy- = Y. 
i=l j=l =l 


i-lj 


"ES: ? 
2 
i-s D) 


6 
= i=l j=l 


1 


1 
= 711.35 — 1; 019 — 7.5492, 


2 2 
SST = т 5G; - » 266; — У) = 1.6875, 


і=1 i=l 


2 6 
SSE = 3 Y Oy – y)? = 5.8617. 
i=l j=l 
(You may verify that SSE is the pooled sum of squares of the deviations for the two 
samples and that Total SS = SST + SSE.) The mean squares for treatment and error, 
respectively, are 


SST 
SSE 5.8617 
MSE = = = .58617. 
2n; = 2 10 
To test the null hypothesis ш = иә, we compute the value of the test statistic 
MST 1.6875 
— 2.88 


F = — = = 
MSE .58617 
and reject Ho if the calculated value of F exceeds Fy. The critical value of the F 
statistic with 1 numerator degree of freedom and 10 denominator degrees of freedom 
fora = .05is Fos = 4.96. Although the MST is almost three times the MSE, it is not 
large enough to permit rejection of the null hypothesis. Consequently, at the о = .05 
level of significance, there is not sufficient evidence to indicate a difference between 
ші and u2. The attained significance level is given by p-value = P(F > 2.88). 
According to Table 7, Appendix 3, p-value > .10. The applet F-Ratio Probabilities 
and Quantiles gives the exact p-value = P(F > 2.88) = .12054. 
The purpose of this example is to illustrate the computations involved in a simple 
ANOVA. The F test for comparing two means is equivalent to a two-sample f test 
because the square of a t-distributed random variable with v df has an F distribution 
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with 1 numerator degree of freedom and v denominator degrees of freedom. You 
can easily verify that the square of to25 = 2.228 (used for the two-tailed test with 
a = .05 and v = 10 df) is equal to Fos = 4.96. Had the ¢ test been used for 
Example 13.1, we would have obtained t — —1.6967, which satisfies the relationship 
t? = (—1.6967 = 2.88 = Е. 
Exercises 

13.1 The reaction times for two different stimuli in a psychological word-association experiment 
were compared by using each stimulus on independent random samples of size 8. Thus, a total 
of 16 people were used in the experiment. Do the following data present sufficient evidence to 
indicate that there is a difference in the mean reaction times for the two stimuli? 

Stimulus 1 | 1 3-2. Tq. 2- 4 3 2 
Stimulus2 | 4 2 3 3 1 2 3 3 
а Use the ANOVA approach to test the appropriate hypotheses. Test at the о = .05 level of 
significance. 
b AppletExercise Use the applet F-Ratio Probabilities and Quantiles to determine the exact 
p-value for the test in part (a). 
с Test the appropriate hypotheses by using the two-sample ¢ test for comparing population 
means, which we developed in Section 10.8. Compare the value of the г statistic to the 
value of the F statistic calculated in part (a). 
d What assumptions are necessary for the tests implemented in the preceding parts? 
13.2 Refer to Exercises 8.90 and 10.77. 


13.3 


a Use an F test to determine whether there is sufficient evidence to claim a difference in the 
mean verbal SAT scores for high school students who intend to major in engineering and 
language/literature. Give bounds for the associated p-value. What would you conclude at 
the о = .05 level of significance? 

b Applet Exercise Use the applet F-Ratio Probabilities and Quantiles to determine the exact 
p-value for the test in part (a). 

c How does the value of the F statistic obtained in part (a) compare to the value of the t 
statistic that you obtained in Exercise 10.77? 


d What assumptions are necessary for the analyses performed in part (a)? 


Comparison of More Than Two Means: 
Analysis of Variance for a One-Way Layout 


An ANOVA to compare more than two population means is a simple generalization of 
the ANOVA presented in Section 13.2. The random selection of independent samples 
from k populations is known as a one-way layout. As indicated in Section 12.4, the data 
in a one-way layout may correspond to data obtained from a completely randomized 
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experimental design (see Definition 12.4) or from taking independent samples from 
each of several existing populations. 

Assume that independent random samples have been drawn from k normal pop- 
ulations with means 441, 45. ..., шк, respectively, and common variance c?. To be 
completely general, we will allow the sample sizes to be unequal and let n;, for 
i = 1,2,..., К, be the number of observations in the sample drawn from the ith pop- 
ulation. The total number of observations in the experiment is п = nı +n2 + ··· Бп. 

Let Y;; denote the response for the jth experimental unit in the ith sample and 
let Yje and Y;, represent the total and mean, respectively, of the n; responses in 
the ith sample. The dot in the second position in the subscript of Y;, is intended to 
remind you that this quantity is computed by summing over all possible values of the 
subscript that is replaced by the dot—j, in this case. Similarly the subscripts of Yj. 
indicate that this mean is calculated by averaging the values in the ith sample. Thus, 
fori = 1, 2,..., Ё, 


п = 1 п 1 
к=) and „= (1) (2). 
j=l t j=l ! 


This modification in the symbols for sample totals and averages will simplify the 
computing formulas for the sums of squares. 
Then, as in the ANOVA involving two means, we have 


Total SS = SST + SSE 
(proof deferred to Section 13.6), where 


k nj k n 


Total 55 = 9 (i; У)? = 3 Y? - CM, 


i=l j=l i=l j=l 


CM 


п п 


1 


3 
(total of all observations)? bu e 2 


1 jzl 


(the symbol CM denotes correction for the mean), 


k k ү? 
СҮЗ = om -yy- мт = CM, 
=] i=l 


SSE = Total SS — SST: 


Although the easy way to compute SSE is by subtraction, as shown earlier, it is 
interesting to observe that SSE is the pooled sum of squares for all k samples and is 
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equal to 
k nj Nu 
SSE 2$ O= Yin)? 
i=l j=l 
k 
= ш —1)5], 
i=l 
where 
ea Y; ey 
a en ч ms 
Notice that SSE is a function of only the sample variances S. fori = 1,2,....К. 


Because each of the 52 values provides an unbiased estimator for o? = o? with n; — 
1 df, an unbiased estimator of o? based on (n, +7) + --- +n, — k) = n — k df is 


provided by 
SSE SSE 
S? = MSE = = ! 
їй = Nth =r =) c DE 
Because 
Ts 9, У 
= n 1j — " Nit ie, 
i=l jal i=l 
it follows that SST is a function of only the sample means Ү,., fori = 1,2,...,k. 


The MST possesses (k — 1) df—that is, 1 less than the number of means—and is 


T 
мл 8 
k= 1 


To test the null hypothesis, 
Ho: ш = W2 = +++ = ш, 


against the alternative that at least one of the equalities does not hold, we compare 
MST with MSE, using the F statistic based on v; = k— 1 and v = n—k numerator and 
denominator degrees of freedom, respectively. The null hypothesis will be rejected if 


MST 
К = RE 
MSE 


where А, is the critical value of F for a test of level o. In Exercise 13.6, you will prove 
that, under Ho: ш = шо = ··· = цк, the statistic F possesses an F distribution with 
k — 1 andn — k numerator and denominator degrees of freedom, respectively. 
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In keeping with our previous conventions, we will use the notation y;; to denote the 
observed value of Y;;. Similarly, we will use y;, and y;, to denote the observed values 
of Y;, and Y;,,, fori = 1,2,...,k, respectively. Intuitively, the greater the differences 
among the observed values of the treatment means, Yie, y2,. ..., Yke» the greater is 
the evidence to indicate a difference among the corresponding population means. If 
all of the treatment means are identical, у, = У, = --- = Yke = y, and all of the 
differences that appear in the preceding expression for SST equal zero, implying that 
SST = 0. As the treatment means get farther apart, the deviations (y;, — y) increase in 
absolute value and the observed value of SST increases in magnitude. Consequently, 
the larger the observed value of SST, the greater is the weight of evidence favoring 
rejection of the null hypothesis. This same line of reasoning applies to the F tests 
employed in the ANOVA for all designed experiments. 

The assumptions underlying the ANOVA F tests deserve particular attention. 
Independent random samples are assumed to have been selected from the k pop- 
ulations. The k populations are assumed to be normally distributed with variances 
o? = оў =... = of = о? and means H1, Шо, ---, Uk. Moderate departures from 
these assumptions will not seriously affect the properties of the test. This is particu- 
larly true of the normality assumption. The assumption of equal population variances 
is less critical if the sizes of the samples from the respective populations are all equal 
(ni = m = +--+: = nj). A one-way layout with equal numbers of observations per 
treatment is said to be balanced. 


EXAMPLE 13.2 


Solution 


Four groups of students were subjected to different teaching techniques and tested at 
the end of a specified period of time. As a result of dropouts from the experimental 
groups (due to sickness, transfer, etc.), the number of students varied from group 
to group. Do the data shown in Table 13.2 present sufficient evidence to indicate a 
difference in mean achievement for the four teaching techniques? 


The observed values of the quantities necessary to compute the value of the F 


statistic are 7 
ur 1779 
м= (> 7 = - = 137,601.8, 


i-] j=l 


4 п 
Total SS = У, Уур — CM = 139,511 — 137,601.8 = 19092, 


ij 
i=1 j=l 


4 2 
SST = у — CM = 138,314.4 — 137,601.8 = 712.6, 
і=1 i 


SSE = Total SS — SST = 1196.6. 
The observed values of MST and MSE are 


MST = D: E ыш =:237.9; 
к= 1 3 
E 1196. 
е о 


n—k 19 
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Table 13.2 Data for Example 13.2 


1 2 3 4 
65 75 59 94 
87 69 78 89 
73 83 67 80 
79 81 62 88 
81 72 83 
69 79 76 
90 
yi. 454 549 425 351 
п 6 7 6 4 


Vie 75.67 78.43 70.83 87.75 


Finally, the observed value of the test statistic for testing the null hypothesis 
Ну: ш = Шо = из = Ш 15 
F MST 237.5 
| MSE 63.0 
where the appropriate numerator and denominator degrees of freedom are v; = 
k — 1 = 3 and v = n — k = (64-7 + 6 + 4) — 4 = 19, respectively. 

The attained significance levelis given by p-value = P(F > 3.77). Using Table 7, 
Appendix 3, with 3 numerator and 19 denominator degrees of freedom, we see that 
.025 « p-value < .05. Thus, if we choose a = .05 (or any larger value), we reject 
the null hypothesis and conclude that there is sufficient evidence to indicate a differ- 
ence in mean achievement among the four teaching procedures. The applet F-Ratio 


Probabilities and Quantiles can be used to establish that the exact p-value = P(F > 
3.77) = .02808. El 


== 3.77, 


13.4 


You may feel that this conclusion could have been made оп the basis of visual 
observation of the treatment means. However, it is not difficult to construct a set of 
data that will lead the visual decision maker to erroneous results. 


An Analysis of Variance Table 
for a One-Way Layout 


The calculations for an ANOVA are usually displayed in an ANOVA (or AOV) table. 
The table for the design in Section 13.3 for comparing k treatment means is shown in 
Table 13.3. The first column shows the source associated with each sum of squares; 
the second column gives the respective degrees of freedom; the third and fourth 
columns give the sums of squares and mean squares, respectively. A calculated value 
of Е, comparing MST and MSE, is usually shown in the fifth column. Notice that 
SST + SSE = Total SS and that the sum of the degrees of freedom for treatments and 
error equals the total number of degrees of freedom. 
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Table 13.3 ANOVA table for a one-way layout 


Source df SS MS F 
T MST 
Treatments k—1 SST MST = B MST 
k—1 MSE 
E 
Error n—k SSE MSE = 5E 
n-— 


k 


n—i 33^ 09 -y 


i=l j=l 


Total 


Table 13.4 ANOVA table for Example 13.2 


Source df SS MS F 
Treatments 3 712.6 237.5 3.77 
Error 19 1196.6 63.0 

Total 22 19092 


The ANOVA table for Example 13.2, shown in Table 13.4, gives a compact pre- 
sentation of the appropriate computed quantities for the analysis of variance. 


Exercises 


State the assumptions underlying the ANOVA of a completely randomized design. 


Refer to Example 13.2. Calculate the value of SSE by pooling the sums of squares of deviations 
within each of the four samples and compare the answer with the value obtained by subtrac- 
tion. This is an extension of the pooling procedure used in the two-sample case discussed in 
Section 13.2. 


In Exercise 6.59, we showed that if Y, and Y; are independent x?-distributed random variables 
with vı and v» df, respectively, then Y; + Y; has a x? distribution with v, + v; df. Now suppose 
that W = U + V, where U and V are independent random variables, and that W and V have x? 
distributions with r and s df, respectively, where r > s. Use the method of moment-generating 
functions to prove that U must have a x? distribution with r — s df.! 


Suppose that independent samples of sizes л, n2, ..., ng are taken from each of k normally 
distributed populations with means 41, M2, ..., ик and common variances, all equal to c?. Let 
Yi; denote the jth observation from population i, for j = 1,2, ..., n; andi = 1, 2, ..., k, and 
letn = n; +n +- +g. 


a Recall that 


where S? = 


k 
55Е = (nm; - Ds? 


i=l 


1 п; Е 
ета? 
і j=l 
Argue that SSE/c? has a x? distribution with (n; — 1) + (n3 — 1) +: ---- (ny — 1) = n—k df. 


]. Exercises preceded by an asterisk are optional. 
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b Argue that under the null hypothesis, Но: ш = ш = ++: = ми all the Yij's are inde- 
pendent, normally distributed random variables with the same mean and variance. Use 
Theorem 7.3 to argue further that, under the null hypothesis, 


ni 


k 
Total SS = » YO -Yy 


i=l j=l 


is such that (Total SS)/o? has a x? distribution with n — 1 df. 


с InSection 13.3, we argued that SST is a function of only the sample means and that SSE is 
a function of only the sample variances. Hence, SST and SSE are independent. Recall that 
Total SS = SST + SSE. Use the results of Exercise 13.5 and parts (a) and (b) to show that, 
under the hypothesis Ho: ш = шо = ··· = шщ, SST/o? has a x? distribution with k — 1 df. 

d Use the results of parts (a)—(c) to argue that, under the hypothesis Но: ш = H2 = +-+ = Hk, 
F — MST/MSE has an F distribution with k — 1 and n — k numerator and denominator 
degrees of freedom, respectively. 


Four chemical plants, producing the same products and owned by the same company, discharge 
effluents into streams in the vicinity of their locations. To monitor the extent of pollution created 
by the effluents and to determine whether this differs from plant to plant, the company collected 
random samples of liquid waste, five specimens from each plant. The data are given in the 
accompanying table. 


Plant Polluting Effluents (Ib/gal of waste) 

A 1.65 1.72 1.50 1.37 1.60 
B 1.70 1.85 1.46 2.05 1.80 
С 1.40 1.75 1.38 1.65 1.55 
р 2.10 1.95 1.65 1.88 2.00 


а Do the data provide sufficient evidence to indicate a difference іп the mean weight of 
effluents per gallon in the effluents discharged from the four plants? Test using о = .05. 


b Applet Exercise Find the p-value associated with the test in part (a) using the applet 
F-Ratio Probabilities and Quantiles. 


In a study of starting salaries for assistant professors, five male assistant professors at each of 
three types of doctoral-granting institutions were randomly polled and their starting salaries 
were recorded under the condition of anonymity. The results of the survey (measured in $1000) 
are given in the following table.” 


Public Universities — Private-Independent Church-Affiliated 


49.3 81.8 66.9 
49.9 71.2 57.3 
48.5 62.9 97.7 
68.5 69.0 46.2 
54.0 69.0 52.2 


2. Source: Adapted from “Average Salary for Men and Women Faculty, by Category, Affiliation, and 
Academic Rank 2002-2003,” Academe: Bulletin of the American Association of University Professors, 
March-April 2003, 37. 
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a What type of experimental design was utilized when the data were collected? 


b Istheresufficientevidence to indicate a difference in the average starting salaries of assistant 
professors at the three types of doctoral-granting institutions? Use the table in the text to 
bound the p-value. 

c Applet Exercise Determine the exact p-value by using the applet F-Ratio Probabilities 
and Quantiles. 


In a comparison of the strengths of concrete produced by four experimental mixes, three 
specimens were prepared from each type of mix. Each of the 12 specimens was subjected to in- 
creasingly compressive loads until breakdown. The accompanying table gives the compressive 
loads, in tons per square inch, attained at breakdown. Specimen numbers 1—12 are indicated in 
parentheses for identification purposes. 


Mix A Mix B Mix C Mix D 


(02.0 (2220 (215  (4)225 
(52.20 (62.0 (0245 (8) 2.15 
(9) 2.25 (10)220 (10220 (12) 2.25 


a Assuming that the requirements for a one-way layout are met, analyze the data. State 
whether there is statistical support at the о = .05 level of significance for the conclusion 
that at least one of the concretes differs in average strength from the others. 

b Applet Exercise Use the applet F-Ratio Probabilities and Quantiles to find the p-value 
associated with the test in part (a). 


A clinical psychologist wished to compare three methods for reducing hostility levels in uni- 
versity students. А psychological test (HLT) was used to measure the degree of hostility. High 
Scores on this test indicate great hostility. Eleven students obtaining high and nearly equal 
Scores were used in the experiment. Five were selected at random from among the 11 problem 
cases and treated by method A. Three were taken at random from the remaining 6 students and 
treated by method B. The other 3 students were treated by method C. АП treatments continued 
throughout a semester. Each student was given the HLT test again at the end of the semester, 
with the results shown in the accompanying table. 


Method A  MethodB Method C 


73 54 79 
83 74 95 
76 71 87 
68 
80 


a Do the data provide sufficient evidence to indicate that at least one of the methods of treat- 
ment produces a mean student response different from the other methods? Give bounds for 
the attained significance level. 

b Applet Exercise Find the exact p-value by using the applet F-Ratio Probabilities and 
Quantiles. 


с What would you conclude at the œ = .05 level of significance? 


It is believed that women in the postmenopausal phase of life suffer from calcium defi- 
ciency. This phenomenon is associated with the relatively high proportion of bone fractures 
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for women in that age group. Is this calcium deficiency associated with an estrogen deficiency, a 
condition that occurs after menopause? To investigate this theory, L. S. Richelson and 
colleagues* compared the bone mineral density in three groups of women. 

The first group of 14 women had undergone oophorectomy (the surgical removing of ovaries) 
during young adult womanhood and had lived for a period of 15 to 25 years with an estrogen 
deficiency. A second group, identified as premenopausal, were approximately the same age 
(approximately 50 years) as the oophorectomy group except that the women had never suffered 
a period of estrogen deficiency. The third group of 14 women were postmenopausal and had 
suffered an estrogen deficiency for an average of 20 years. The mean and standard error of the 
mean for the three samples of lumbar spine bone-density measurements—14 measurements in 
each sample, one for each subject—are recorded in the following table. 


Oophorectomized Premenopausal Postmenopausal 
Group I Group 2 Group 3 
Mean Standard Error Mean Standard Error Mean Standard Error 
0.93 0.04 1.21 0.03 0.92 0.04 


a Is there sufficient evidence to permit us to conclude that the mean bone-density measure- 
ments differ for the three groups of women? What is the p-value associated with your 
test? 


b What would you conclude at the о = .05 level? 


If vegetables intended for human consumption contain any pesticides at all, these pesticides 
should occur in minute quantities. Detection of pesticides in vegetables sent to market is accom- 
plished by using solvents to extract the pesticides from the vegetables and then performing tests 
on this extract to isolate and quantify the pesticides present. The extraction process is thought 
to be adequate because, if known amounts of pesticides are added to “clean” vegetables in 
а laboratory environment, essentially all the pesticide can be recovered from the artificially 
contaminated extract. 

The following data were obtained from a study by Willis Wheeler and colleagues,* who 
sought to determine whether the extraction process is also effective when used in the more 
realistic situation where pesticides are applied to vegetable crops. Dieldrin (a commonly used 
pesticide) labeled with (radioactive) carbon-14 was applied to growing radishes. Fourteen days 
later, the extraction process was used, and the extracts were analyzed for pesticide content. A 
liquid scintillation counter was used to determine the amount of carbon-14 present in the extract 
and also the amount left behind in the vegetable pulp. Because the vegetable pulp typically 
is discarded when analyzing for pesticides, if an appreciable proportion of pesticide remains 
in this pulp, a serious underassessment of the amount of pesticide could result. The pesticide 
was the only source of carbon-14; thus, the proportion of carbon-14 in the pulp is likely to be 
indicative of the proportion of pesticide in the pulp. The following table shows a portion of the 
data that the researchers obtained when low, medium, and high concentrations of the solvent, 
acetonitrile, were used in the extraction process. 


3. Source: L. S. Richelson, H. W. Wahner, L. J. Melton III, and B. L. Riggs, "Relative Contributions of 
Aging and Estrogen Deficiency to Postmenopausal Bone Loss,” New England Journal of Medicine 311(20) 
(1984): 1273-1275. 

4. Source: Willis B. Wheeler, N. P. Thompson, R. L. Edelstein, R. C. Littel, and R. T. Krause, "Influence 
of Various Solvent- Water Mixtures on the Extraction of Dieldrin and Methomyl Residues from Radishes,” 
Journal of the Association of Official Analytical Chemists 65(5) (1982): 1112-1117. 
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Percentage of carbon-14 in vegetable pulp 


Concentration of Acetonitrile 
Low Medium High 


23.37 20.39 18.87 
25.13 20.87 19.69 
23.78 20.78 19.29 
27.74 20.19 18.10 
25.30 20.01 18.42 
25.21 20.23 19.33 
22.12 20.73 17.26 
20.96 19.53 18.09 
23.11 18.87 18.69 
22.57 18.17 18.82 
24.59 23.34 18.72 
23.70 22.45 18.75 


Total 287.58 245.56 224.03 


a Istheresufficientevidence that the mean percentage of carbon-14 remaining in the vegetable 
pulp differs for the different concentrations of acetonitrile used in the extraction process? 
Give bounds for, or use the appropriate applet to determine the attained significance level. 
What would you conclude at the o — .01 level of significance? 


b What assumptions are necessary to validly employ the analysis that you performed in 
part (a)? Relate the necessary assumptions to the specific application represented in this 
exercise. 


One portion of the research described in a paper by Yean-Jye Lu? involved an evaluation of 
maneuver times for vehicles of various sizes that were involved in making a left turn at an 
intersection with a separate left-turn lane but without a separate left-turn phase on the traffic 
light governing the intersection (an “unprotected” left-turn maneuver). The maneuver time was 
measured from the instant that a vehicle entered the opposing lanes of traffic until it completely 
cleared the intersection. Four-cylinder automobiles were classified as “small cars" and six- 
or eight-cylinder automobiles as “large cars.” Trucks and buses were combined to form a 
third category identified as “truck or bus.” Other motorized vehicles (motorcycles, etc.) were 
ignored in the study. A summary of the data, giving maneuver times (in seconds) for vehicles 
that attempted the left-turn maneuver from a standing stop, appears in the accompanying table. 


Vehicle Type Sample Size Mean Standard Deviation 


Small car 45 4.59 0.70 
Large car 102 4.88 0.64 
Truck or bus 18 6.24 0.90 


a Is there sufficient evidence to claim that the mean maneuver times differ for the three 
vehicle types? Give bounds for the attained significance level. 


b Indicate the appropriate conclusion for an о = .05 level test. 


5. Source: Yean-Jye Lu, “A Study of Left-Turn Maneuver Time for Signalized Intersections,” ITE Journal 
54 (October 1984): 42-47. Institute of Transportation Engineers, Washington, D.C., ©1984 LT.E. АП 
rights reserved. 
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The Florida Game and Fish Commission desires to compare the amounts of residue from 
three chemicals found in the brain tissue of brown pelicans. Independent random samples 
of ten pelicans each yielded the accompanying results (measurements in parts per million). Is 
there evidence of sufficient differences among the mean residue amounts, at the 5% level of 
significance? 


Chemical 
Statistic DDE DDD DDT 
Mean .032 022 041 


Standard deviation .014 .008 .017 


Water samples were taken at four different locations in a river to determine whether the quantity 
of dissolved oxygen, a measure of water pollution, differed from one location to another. 
Locations 1 and 2 were selected above an industrial plant, one near the shore and the other in 
midstream; location 3 was adjacent to the industrial water discharge for the plant; and location 
4 was slightly downriver in midstream. Five water specimens were randomly selected at each 
location, but one specimen, from location 4, was lost in the laboratory. The data are shown 
in the accompanying table (the greater the pollution, the lower will be the dissolved oxygen 
readings). Do the data provide sufficient evidence to indicate a difference in mean dissolved 
oxygen content for the four locations? Give bounds for the attained significance level. 


Location Dissolved Oxygen Content 
1 59 61 63 61 60 
2 63 66 64 64 65 
3 48 43 50 47 51 
4 60 62 61 538 


An experiment was conducted to examine the effect of age on heart rate when subjects perform 
a specific amount of exercise. Ten male subjects were randomly selected from four age groups: 
10—19, 20—39, 40—59, and 60—69. Each subject walked a treadmill at a fixed grade for a period 
of 12 minutes, and the increase in heart rate—the difference in rates before and after exercise— 
was recorded (in beats per minute). Preliminary calculations yielded Total SS = 1002.975 and 
SST — 67.475. 


a Construct the associated ANOVA table. 


b Do the data provide sufficient evidence to indicate differences in mean increase in heart 
rate among the four age groups? Test by using œ = .05. 


A Statistical Model for the One-Way Layout 


As earlier, we let Y;; denote the random variables that generate the observed values y;;, 


fori = 1,2,...,kand j = 1,2,...,nj. The Y;;-values correspond to independent 
random samples from normal populations with E(Y;;) = ш and V(Y;j) = о?, for 
i = 1,2,..., kand j = 1, 2, ..., ni. Let us consider the random sample drawn from 


population 1 and write 


Yij = pı + &1j, PH 1 2; Mie 
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Equivalently, 
aj = Yij — Ш, Jj —1,2, n: 


Because £;; is the difference between a normally distributed random variable and 
its mean, it follows that €,; is normally distributed with E(e,;) = 0 and V(e;;) = 
VN) = о?. Further, the independence of Y;;, for j = 1,2,...,nj, implies that 


51у, for j = 1,2,...,nj, are mutually independent random variables. For each 
i = 1,2,...,k, ме can proceed in an analogous manner to write 
Ү = ш + £ij, J—1,2;n5 


where the “error terms" e;; are independent, normally distributed random variables 
with E(e;) = 0 and V(eijj) = o?, for i = 1,2,...,k and J = 1,2,...,n;. The 
error terms simply represent the difference between the observations in each sample 
and the corresponding population means. 

One more set of considerations will lead to the classical model for the one-way 
layout. Consider the means jz;, for i = 1, 2,..., К, and write 


Hi = и+ т; where т + т +: т = 0. 


Notice that Е ш = ku + 2 t; = Ки, and hence и = k^! x ш: is just 
the average of the k population means (the jz;-values). For this reason, и is generally 
referred to as the overall mean. Since fori = 1,2,...,k, t = ш — u quantifies 
the difference between the mean for population 7 and the overall mean, т; is usually 
referred to as the effect of treatment (or population) i. Finally, we present the classical 
model for the one-way layout. 


Statistical Model for a One-Way Layout 
FOr? = s 25 sco апу = llo 25 coon Min 


Vig e [WLP us r8 


where Y;; — the jth observation from population (treatment) i, 
ш = the overall mean, 
t; = the nonrandom effect of treatment i, where Sum us = (0), 
£jj = random error terms such that ¢;; are independent normally 
distributed random variables with E(¢;;) = 0 and V(s;;) = о?. 


The advantage of this model is that it very clearly summarizes all the assumptions 
made in the analysis of the data obtained from a one-way layout. It also gives us a 
basis for presenting a precise statistical model for the randomized block design. (See 
Section 13.8.) 

Notice that (see Exercise 13.19) Ho: ш = u2 = - - - = ци can be restated as 


Ho:tj— 0 =... = ц = 0 
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13.17 


13.18 


13.19 


13.6 


and that H,: ш; # ju for some i Æ i’ is equivalent to H,:t; Æ 0 for some i, 
1 <i <k. Thus, the F test for equality of means that we presented in Section 13.3 
is the test of the hypotheses 


Н: = T2 =-= t — 0. versus A,:1t; 40 for some i, l <i < К. 


Exercises 


Let Y;, denote the average of all of the responses to treatment i. Use the model for the one-way 
layout to derive E(Y;.) and V(Y;.). 


Refer to Exercise 13.17 and consider Y;, — Y;, fori Æ i’. 


a Show that E(Y;. — Yi.) = Шш — Ши = Ti — ty. This result implies that Yie — Yie is an 
unbiased estimator of the difference in the effects of treatments i and i’. 
b Derive V(Y;, — Yie). 


Refer to the statistical model for the one-way layout. 


a Show that Ho:t; = т —--: = тр = 0 is equivalent to Ho: ш = M2 = ··· = Lg. 
b Show that H, : т; Æ 0 for at least one i is equivalent to Н, : 4; Z uy for some i Æ i’. 


Proof of Additivity of the Sums of Squares 
and E (MST) for a One-Way 
Layout (Optional) 


The proof that 
Total SS = SST + SSE 


for the one-way layout is presented in this section for the benefit of those who are 
interested. It may be omitted without loss of continuity. 

The proof uses elementary results on summations that appear in the exercises for 
Chapter 1 and the device of adding and subtracting Y;, within the expression for the 
Total SS. Thus, 


k п k nj 


Total SS — УУ О») = У) b» 208 =i, Yny 


[Yi — Vie)” + 2(%i; — Yio) Vie — Y) + (Vie — Ү)?]. 
i=l j=l 
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Summing first over j, we obtain 


k п; ni 

Total SS — у, [р —Y¥;.)? 4-2(Yi. – Y) = Yi) - ni(Yi — т) 
i=l Lj=l j=l 

where 


ni 


yc — Yie) = Yi, — niYi, = Vin — Yi, = 0. 


Consequently, the middle term in the expression for the Total SS is equal to zero. 
Then, summing over i, we obtain 


ni 


k 
Total SS — У) ou — Yi Y ny. – У) = SSE + SST. 
i=l j=l i=l 

Proof of the additivity of the ANOVA sums of squares for other experimental 
designs can be obtained in a similar manner although the procedure is often tedious. 

We now proceed with the derivation of the expected value of MST for a one-way 
layout (including a completely randomized design). Using the statistical model for 
the one-way layout presented in Section 13.5, it follows that 
E 1 п; п; ni 
Yi = — Ү;; = — ens) n +, where & = —) бу. 


j=l i j=l i j=l 


Because the ¢;;’s are independent random variables with E(¢;;) = Оапа V (e;;j) = a; 
Theorem 5.12 implies (see Example 5.27) that E(é;) = 0 and V(é;) = о? /ni. 
In a completely analogous manner, Y is given by 


ni k ni 


k 
7-15 = р) urne) ut tte, 


i=1 j=l i=1 j=l 
where 
k k 
1] _ ol 
т=— у П; Т; and == – 35: E 
Em hA j=l 


Since the т; values are constants, т is simply a constant; again using Theorem 5.12, 
we obtain E(£) = 0 and V (€) = o? [n. 
Therefore, with respect to the terms in the model for the one-way layout, 


1 з 1 k ET 
MST — (4) » -yyz (=) У`п(т +5 т Р)? 


і=1 
k 


1 ) EM ( 1 ) eae ss 
= | —— п (= t) + | — 2nj(t — T)(€; — €) 
(= 2. PP 


i=l 


1 , 
(ч) еә . 
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Because тапа т;, fori = 1, 2,..., К, are constants and E(¢;;) = E(é;) = E(£) = 0, 
it follows that 


i=l i=l 


k k 
E(MST) = (4) p» ni(u — T) + (=ч) Е » ni(8 — Д 
Notice that 
k k 


n6 -7 = Уу, (пг? — 2п;8;= + п?) 


і=1 i=1 


k k 
= у пі? —2ng + п? = у nig = ne". 
i=l 


i=1 


Because Е(є;) = 0 and V(é;) = o?/ni, it follows that E(&) = o? [nj, fori — 
1,2, ..., к. Similarly, E(g) == о?/п, апа, һепсе, 


К [4 
Е [x т (Œi — zl =) nuE(g)-nE(g) = ko? - о? = (k — 1)о?. 
i=l 


i=l 


Summarizing, we obtain 


1 k 1 k 
E(MST) = о? + (4) 2. niti — т), where T = 22 nit. 


Under Но: т = т = --- = t; = 0, it follows that т = 0, and, hence, E (MST) = 
c?. Thus, when Hj is true, MST/MSE is the ratio of two unbiased estimators for o. 
When На: т; 5 0 for some i, 1 <i < kis true, the quantity 1/(k — 1) Sca п: (т; — 
T)? is strictly positive and MST is а positively biased estimator for o°. 


13.7 Estimation in the One-Way Layout 


Confidence intervals for a single treatment mean and for the difference between a 
pair of treatment means based on data obtained in a one-way layout (Section 13.3) 
are completely analogous to those given in Chapter 8. The only difference between 
the intervals in Chapter 8 and those that follow is that intervals associated with 
the one-way layout use MSE (the pooled estimator based on all k samples) to esti- 
mate the population variance(s) o?. The confidence interval for the mean of treat- 
ment i or the difference between the means for treatments i and i' are, respectively, 
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as follows: 
Yett 2 
ie —— ст 
апа 
1 
io m Yi 5) ар ta 2S TEN z Nn 
ny. 
where 


SSE 
S =v 5?2=уМ 
Фа AP ud ap ооз dg =k 


and fy/2 is based on (n — k) df. 


The confidence intervals just stated are appropriate for a single treatment mean 
or a comparison of a pair of means selected prior to observation of the data. These 
intervals are likely to be shorter than the corresponding intervals from Chapter 8 
because the value of г, го is based on a greater number of degrees of freedom (n — k 
instead of n; — 1 or n; +n; — 2, respectively). The stated confidence coefficients are 
appropriate for a single mean or difference in two means identified prior to observing 
the actual data. If we were to look at the data and always compare the populations 
that produced the largest and smallest sample means, we would expect the difference 
between these sample means to be larger than for a pair of means specified to be of 
interest before observing the data. 


EXAMPLE 13.3 


Solution 


Find a 9596 confidence interval for the mean score for teaching technique 1, Example 
13.2. 


The 95% confidence interval for the mean score is 


= S 
Yu -fos—— 


Jn 
where t 025 is determined for n — k = 19 df, or 


у 63 
75.67 + (2.093) — or 75.67+ 6.78. 


V6 
Notice that if we had analyzed only the data for teaching technique 1, the value of 
1025 would have been based on only n; — 1 = 5 df, the number of degrees of freedom 
associated with s;. П 


EXAMPLE 13.4 


Find a95% confidence interval for the difference in mean score for teaching techniques 
1 and 4, Example 13.2. 
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Solution Тһе 95% confidence interval is 
(Y ie — Үд.) + (2.003)(7.94) J 1/6+ 1/4 or —12.08+ 10.73. 
Hence, the 95% confidence interval for (шу — u4) is (—22.81, —1.35). At the 95% 
confidence level we conclude that ид > 4 by at least 1.35 but no more than 22.81. 
п 

Exercises 

13.20 Refer to Examples 13.2 and 13.3. 

a Use the portion of the data in Table 13.2 that deals only with teaching technique 1 and the 
method of Section 8.8 to form a 95% confidence interval for the mean score of students 
taught using technique 1. 

b How does the length of the 9596 confidence interval that you found in part (a) compare to 
the length of the 95% confidence interval obtained in Example 13.3? 

c What is the major reason that the interval that you found in part (a) is longer than the 
interval given in Example 13.3? 

13.21  Referto Examples 13.2 and 13.4. 

a Use the portion of the data in Table 13.2 that deals only with teaching techniques 1 and 4 
and the method of Section 8.8 to form a 9596 confidence interval for the difference in mean 
score for students taught using techniques 1 and 4. 

b How does the length of the 9596 confidence interval that you found in part (a) compare to 
the length of the 95% confidence interval obtained in Example 13.4? 

c What is the major reason that the interval that you found in part (a) is longer than the 
interval given in Example 13.4? 

13.22 a Basedon your answers to Exercises 13.20 and 13.21 and the comments at the end of this 
section, how would you expect confidence intervals computed using the results of this 
section to compare with related intervals that make use of the data from only one or two of 
the samples obtained in a one-way layout? Why? 

b Refer to part (a). Is it possible that a 95% confidence interval for the mean of a single 
population based only on the sample taken from that population will be shorter than the 
95% confidence interval for the same population mean that would be obtained using the 
procedure of this section? How? 

13.23  Referto Exercise 13.7. 

a Construct a 95% confidence interval for the mean amount of polluting effluent per gallon 
for plant A. If the limit for the mean amount of polluting effluent is 1.5 pound/gallon, would 
you conclude that plant A exceeds this limit? Why? 

b Give a 95% confidence interval for the difference in mean polluting effluent per gallon for 
plants A and D. Does this interval indicate that mean effluent per gallon differs for these 
two plants? Why? 

13.24 Refer to Exercise 13.8. Construct a 98% confidence interval for the difference in mean starting 


salaries for assistant professors at public and private/independent doctoral-granting institutions. 
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Refer to Exercise 13.11. As noted in the description of the experiment, the oophorectomized 
and the premenopausal groups of women were approximately the same age, but those in 
the oophorectomized group suffered from an estrogen deficiency. Form a 95% confidence 
interval for the difference in mean bone densities for these two groups of women. Would you 
conclude that the mean bone densities for the oophorectomized and premenopausal women 
were significantly different? Why? 


Refer to Exercise 13.9. Let pa and ив denote the mean strengths of concrete specimens 
prepared for mix A and mix B, respectively. 


a Find a 90% confidence interval for ua. 
b Find a 95% confidence interval for (ua — ив). 
Refer to Exercise 13.10. Let ua and ив, respectively, denote the mean scores at the end of the 


semester for the populations of extremely hostile students who were treated throughout that 
semester by methods A and B, respectively. Find a 95% confidence interval for 


а Ha. 
b Ив. 
с (ил — Ив). 


Refer to Exercise 13.12. 


a Construct a 95% confidence interval for the mean percentage of carbon-14 that remains in 
the vegetable pulp when the low level of acetonitrile is used. 

b Give a 90% confidence interval for the difference in mean percentages of carbon-14 that 
remain in the vegetable pulp for low and medium levels of acetonitrile. 


Refer to Exercise 13.13. 


a Give a 95% confidence interval for the mean left-turn maneuver time for buses and trucks. 


b Estimate the difference in mean maneuver times for small and large cars with a 95% 
confidence interval. 


с The study report by Lu involved vehicles that passed through the intersection of Guadalupe 
Avenue and 38th Street in Austin, Texas. Do you think that the results in parts (a) and 
(b) would be valid for a “nonprotected” intersection in your hometown? Why or why not? 


It has been hypothesized that treatments (after casting) of a plastic used in optic lenses will 
improve wear. Four different treatments are to be tested. To determine whether any differences 
in mean wear exist among treatments, 28 castings from a single formulation of the plastic were 
made and 7 castings were randomly assigned to each of the treatments. Wear was determined 
by measuring the increase in “haze” after 200 cycles of abrasion (better wear being indicated 
by smaller increases). The data collected are reported in the accompanying table. 


Treatment 
A B C D 


9.6 11.95 11.47 11.35 
13.29 15.15 9.54 8.73 
12.07 14.75 11.26 10.00 
11.97 14.79 13.66 9.75 
13.31 15.48 11.18 11.71 
12.32 13.47 15.03 12.45 
11.78 13.06 14.86 12.38 


13.31 


13.32 


13.33 


13.34 
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a Is there evidence of a difference in mean wear among the four treatments? Use a = .05. 


b Estimate the mean difference in haze increase between treatments В and C, using a 99% 
confidence interval. 


c Find a 90% confidence interval for the mean wear for lenses receiving treatment A. 


With the ongoing energy crisis, researchers for the major oil companies are attempting to find 
alternative sources of oil. It is known that some types of shale contain small amounts of oil 
that feasibly (if not economically) could be extracted. Four methods have been developed for 
extracting oil from shale, and the government has decided that some experimentation should be 
done to determine whether the methods differ significantly in the average amount of oil that each 
can extract from the shale. Method 4 is known to be the most expensive method to implement, 
and method 1 is the least expensive, so inferences about the differences in performance of these 
two methods are of particular interest. Sixteen bits of shale (of the same size) were randomly 
subjected to the four methods, with the results shown in the accompanying table (the units are 
in liters per cubic meter). All inferences are to be made with о = .05. 


Method1 Method2  Method3 Method 4 
3 2 5 5 


2 2 2 2 
1 4 5 4 
2 4 1 5 


a Assuming that the 16 experimental units were as alike as possible, implement the appro- 
priate ANOVA to determine whether there is any significant difference among the mean 
amounts extracted by the four methods. Use œ = .05. 

b Setup a 9596 confidence interval for the difference in the mean amounts extracted by the 
two methods of particular interest. Interpret the result. 


Refer to Exercise 13.14. Construct a 95% confidence interval for the mean amount of residue 
from DDT. 


Refer to Exercise 13.15. Compare the mean dissolved oxygen content in midstream above the 
plant with the mean content adjacent to the plant (location 2 versus location 3). Use a 9596 
confidence interval. 


Refer to Exercise 13.15. Compare the mean dissolved oxygen content for the two locations 
above the plant with the mean content slightly downriver from the plant, by finding a 9596 
confidence interval for (1/2) (ш + мә) — H4. 


Refer to Exercise 13.16. The average increase in heart rate for the ten individuals in each age 
category were 


Average Heart 


Age Sample Size Rate Increase 
10-19 10 30.9 
20-39 10 27.5 
40-59 10 29.5 
60-69 10 28.2 


a Find a 90% confidence interval for the difference in mean increase in heart rate for the 
10—19 and 60—69 age groups. 


b Finda 9096 confidence interval for the mean increase in heart rate for the 20—39 age group. 
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13.8 A Statistical Model for the Randomized 


Block Design 


The method for constructing a randomized block design was presented in Section 
12.4. As previously indicated in Definition 12.6, the randomized block design is a 
design for comparing k treatments using b blocks. The blocks are selected so that, 
hopefully, the experimental units within each block are essentially homogeneous. The 
treatments are randomly assigned to the experimental units in each block in such a 
way that each treatment appears exactly once in each of the b blocks. Thus, the total 
number of observations obtained in a randomized block design is n = bk. Implicit 
in the consideration of a randomized block design is the presence of two qualitative 
independent variables, “blocks” and “treatments.” In this section, we present a formal 
statistical model for the randomized block design. 


Statistical Model for a Randomized Block Design 
[eee = П 2 ась kand m ds 20, song; 


Mss = [а= aR [tls dF Gs 


where Y;; = the observation on treatment i in block j, 
ш = the overall mean, 
т; = the nonrandom effect of treatment 7, where ny 2 i =O, 
В; = the nonrandom effect of block j, where Уа б = 0), 
£jj = random error terms such that ¢;; are independent normally 
distributed random variables with E(¢;;) = 0 and V (s;;) = gr. 


Notice that џ, tj, T2,..., тк, and 61, b2, ..., Вь are all assumed to be unknown 
constants. This model differs from that for the completely randomized design 
(a specific type of one-way layout) only in containing parameters associated with 
the different blocks. Because the block effects are assumed to be fixed but unknown, 
this model usually is referred to as the fixed block effects model. A random block 
effects model, another model for the randomized block design in which the '$ are 
assumed to be random variables, is considered in the supplementary exercises. Our 
formal development in the body of this text is restricted to the fixed block effects 
model. 

The statistical model just presented very clearly summarizes all the assumptions 
made in the analysis of data in a randomized block design with fixed block effects. 
Let us consider the observation Y;; made on treatment i in block j. Notice that the 
assumptions in the model imply that E(Y;;) = и + t + B; and V(Yij) = o? for 
i — 1,2,...,k and j — 1,2,...,b. Let us consider the observations made on 
treatment i and observe that two observations receiving treatment i have means that 
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differ only by the difference of the block effects. For example, 
E(Yi1) — E(Yi) = u ti + Bi — (u + ti + В) = Bi — В. 


Similarly, two observations that are taken from the same block have means that differ 
only by the difference of the treatment effects. That is, if i Æ i’, 


E(Yij) — EV) = u c ti t Bj – (A ti t Bj) — vi — vy. 


Observations that are taken on different treatments and in different blocks have means 
that differ by the difference in the treatment effects plus the difference in the block 
effects because, if i Æ i' and j # j’, 


E(Yij) — Е(Ү у) = w+ + Bj (ut ve + By) = Gi — ww) + (B; — Ву). 


In the next section, we proceed with an analysis of the data obtained from a 
randomized block design. 


Exercises 


State the assumptions underlying the ANOVA for a randomized block design with fixed block 
effects. 


According to the model for the randomized block design given in this section, the expected 

response when treatment i is applied in block j is E(Y;ij) = w+ t; + Bj, fori = 1,2,...,k 

and j — 1,2,...,b. 

a Use the model given in this section to calculate the average ofthe n = bk expected responses 
associated with all of the blocks and treatments. 

b Give an interpretation for the parameter и that appears in the model for the randomized 


block design. 


Let Y;, denote the average of all of the responses to treatment i. Use the model for the random- 
ized block design to derive E (Y is) and V (Ү ie): Is Y;, an unbiased estimator for the mean 
response to treatment i? Why or why not? 


Refer to Exercise 13.38 and consider Y;, — Y;. fori Zi’. 


a Show that E (Yi. — Yie) = T; — Ty. This result implies that Y;. — Yp. is an unbiased 
estimator of the difference in the effects of treatment i and i’. 

b Derive V (Fis — Ya) 

Refer to the model for the randomized block design and let Y, ; denote the average of all of the 

responses in block j. 


a Derive E (Y.;) and V (Y.;). 


b Show that Y, s ys j is an unbiased estimator for B; — fj; the difference in the effects of 
blocks j and j’. 


с Derive V (Y.j — Ү.у). 
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The Analysis of Variance for a 
Randomized Block Design 


The ANOVA for a randomized block design proceeds much like that for a completely 
randomized design (which is a special case of the one-way layout). In the randomized 
block design, the total sum of squares, Total SS, is partitioned into three parts: the 
sum of squares for blocks, treatments, and error. 

Denote the total and average of all observations in block j as Y,; and Y. j» respec- 
tively. Similarly, let Y;, and Y;, represent the total and the average for all observa- 
tions receiving treatment i. Again, the “dots” in the subscripts indicate which index is 
“summed over” to compute the totals and “averaged over" to compute the averages. 
Then for a randomized block design involving b blocks and К treatments, we have 
the following sums of squares: 


k b k b 
Total SS = oM Up эзо 
i=l j=l i=l j=l 
= SSB + SST + SSE, where 
b b y2 
SSB=k)Y (Y,; Y) = CM 
2 РЕ ap 


j=l 


k k y2 
sST=0) o= =Y A — CM, 


i=l j=l 


SSE = Total SS — SSB — SST. 


In the preceding formulas, 


Y= (average of all n = bk observations) = Yi 


bk 4 “= 


1 k 
j=l i= 


and 


2 
(total of all observations)? 1 E 
c n S DAE 


j=l i=l 


The ANOVA table for the randomized block design is presented in Table 13.5. 
The degrees of freedom associated with each sum of squares are shown in the second 
column. Mean squares are calculated by dividing the sum of squares by their respective 
degrees of freedom. 

To test the null hypothesis that there is no difference in treatment means, we use 
the F statistic 
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Table 13.5 ANOVA table for a randomized block design 


Source df SS MS 
B 
Blocks b-1 SSB POP 
b-1 
Т 
Treatments k—1 SST Es 
k—1 
Error n—b—k-cl1 SSE MSE 
Total п—1 Total SS 
MST 
Ju ———= 
MSE 


and reject the null hypothesis if F > Fx, basedon v; = (k—1) and v; = (n—b—k4-1) 
numerator and denominator degrees of freedom, respectively. 

As discussed in Section 12.4, blocking can be used to control for an extraneous 
source of variation (the variation between blocks). In addition, with blocking, we 
have the opportunity to see whether evidence exists to indicate a difference in the 
mean response for blocks. Under the null hypothesis that there is no difference in 
mean response for blocks (that is, 8; = 0, for j = 1, 2,..., b), the mean square for 
blocks (MSB) provides an unbiased estimator for о? based on (b — 1) df. Where real 
differences exist among block means, MSB will tend to be inflated in comparison 
with MSE, and 


MSB 
F=—_ 
MSE 


provides a test statistic. As in the test for treatments, the rejection region for the 
test is 


F > Е, 


where F has vj = b — 1 and v = n — b — k + 1 numerator and denominator degrees 
of freedom, respectively. 


EXAMPLE 13.5 


A stimulus—response experiment involving three treatments was laid out in a ran- 
domized block design using four subjects. The response was the length of time until 
reaction, measured in seconds. The data, arranged in blocks, are shown in Figure 13.2. 
The treatment number is circled and shown above each observation. Do the data 
present sufficient evidence to indicate a difference in the mean responses for stimuli 
(treatments)? Subjects? Use о = .05 for each test and give the associated p-values. 
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FIGURE 13.2 Subjects 
Randomized block 1 3 
design for 
Example 13.5 


4 


© 


2.1 


Table 13.6 ANOVA table for Example 13.5 


Source df SS MS F 
Blocks З 3.48 1.160 15.47 
Treatments 2 5.48 2.740 36.53 
Error 6 45 .075 

Total 11 9.41 


Solution The observed values of the sums of squares for the ANOVA are shown jointly in 
Table 13.6 and individually as follows: 


к= (total? = (21.2)? 


= 37.45, 
n 12 
4 3 3 
Total SS = УУ Gi; — yy” = У Уу – CM = 46.86 — 3745 = 9.41, 
jal i=l] j=l i=1 
4 ү?. 


SSB = °} L CM = 40.93 — 37.45 = 3.48, 
3 

3 y2 

SST = —1* CM = 42.93 — 37.45 = 5.48, 
4 


i=l 


SSE = Total SS — SSB — SST = 9.41 — 3.48 — 5.48 = .45. [s] 


We use the ratio of MST and MSE to test a hypothesis of no difference in the mean 
response for treatments. Thus, the calculated value of F is 
MST 274 
Е = — = — = 36,53. 
MSE  .075 
The critical value of the F statistic (a = .05) for vı = 2 and v = 6dfis Fos = 5.14. 
Because the computed value of F exceeds the critical value, there is sufficient evidence 
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at the о = .05 level to reject the null hypothesis and conclude that real differences 
do exist among the expected responses for the three stimuli. The correspond- 
ing p-value = P(F > 36.53), which, based on Table 7, Appendix 3, is such that 
p-value « .005. The applet F-Ratio Probabilities and Quantiles provides the exact 
p-value = P(F > 36.53) = .00044. 

A similar test may be conducted for the null hypothesis that no difference exists in 
the mean response for subjects. Rejection of this hypothesis would imply that there are 
significant differences among subjects and that blocking is desirable. The computed 
value of F based on v, = 3 and v; = 6 df is 


MSB 1.16 
Е = = —_ = 15.47. 
MSE  .075 


Because this value of F exceeds the corresponding tabulated critical value, Figs = 
4.76, we reject the null hypothesis and conclude that a real difference exists in the mean 
responses among the four groups of subjects. The applet yields that the associated 
p-value = P(F > 15.47) = .00314. Based upon Table 7, Appendix 3, we would 
have concluded only that p-value < .005. Regardless, we conclude that blocking by 
subjects was beneficial. 


Exercises 


In Exercise 12.10, a matched-pairs analysis was performed to compare the differences in mean 
CPU time to run benchmark programs on two computers. The data are reproduced in the 
following table. 


Benchmark Program 


Computer 1 2 3 4 2 6 
1 1.12 1.73 1.00 1.86 147 210 
2 115 1.72 110 187 146 215 


a Treat the six programs as six blocks and test for a difference between the mean CPU times 
for the two computers by using a randomized block analysis. Use о = .05. How does your 
decision compare to that reached in Exercise 12.10(a)? 


b Give bounds for the associated p-value. How does your answer compare to your answer to 
Exercise 12.10(b)? 

c Applet Exercise Use the applet F-Ratio Probabilities and Quantiles to find the exact 
p-value. 


d How does the computed value of MSE compare to the value for 52 that you used in your 
solution to Exercise 12.10? 


The accompanying table presents data on yields relating to resistance to stain for three materials 
(Mı, M2, and M3) treated with four chemicals in a randomized block design. (A low value 
indicates good stain resistance.) 
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Material 
Chemical M; М М; Total 
A 5 9 7 21 
B 3 8 4 15 
C 8 13 9 30 
D 4 6 8 18 
Total 20 36 28 84 


2 
wid =614 $(D Vj) =588 


a Is there evidence of differences in mean resistance among the four chemicals? Give bounds 
for the p-value. 


b What would you conclude at the о = .05 level of significance? 
Refer to Exercise 13.42. Why was a randomized block design used to compare the chemicals? 


Do average automobile insurance costs differ for different insurance companies? Other vari- 
ables that impact insurance costs are geographic location, ages of the drivers, and type of 
coverage. The following are estimates (in dollars) of the cost of 6-month policies for basic 
liability coverage for a single man who has been licensed for 6—8 years, has no violations or 
accidents, and drives between 12,600 and 15,000 miles per year. 


Insurance Company 


21st Fireman’s State 
Location Century Allstate AAA Fund Farm 
Riverside 736 745 668 1065 1202 
San Bernadino 836 725 618 869 1172 
Hollywood 1492 1384 1214 1502 1682 
Long Beach 996 884 802 1571 1272 


a What type of design was used in the collection of this data? 

b Istheresufficientevidence to indicate that average insurance premiums differ from company 
to company? 

C Isthere sufficient evidence to indicate that insurance premiums differ location to location? 


d Applet Exercise Use the applet F-Ratio Probabilities and Quantiles to find the p-values 
associated with the tests in parts (b) and (c). 


An experiment was conducted to determine the effect of three methods of soil preparation on 
the first-year growth of slash pine seedlings. Four locations (state forest lands) were selected, 
and each location was divided into three plots. Because soil fertility within a location was likely 
to be more homogeneous than between locations, a randomized block design was employed, 
using locations as blocks. The methods of soil preparation were A (no preparation), B (light 
fertilization), and C (burning). Each soil preparation was randomly applied to a plot within 
each location. On each plot the same number of seedlings was planted, and the observation 
recorded was the average first-year growth (in centimeters) of the seedlings on each plot. These 
Observations are reproduced in the accompanying table. 


6. Source: “2003 Auto Insurance;" California Department of Insurance, http:cdinswww.insurance.ca.gov/ 
pls/wu-survey-auto/apsw-get-premS$auto-mc.querylist, 23 April 2004. 
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Soil Location 


Preparation 1 2 3 4 


A 11 13 16 10 
B 15 17 20 12 
C 10 15 13 10 


a Conduct an ANOVA. Do the data provide sufficient evidence to indicate differences in the 
mean growth for the three soil preparations? 


b Is there evidence to indicate differences in mean growth for the four locations? 


A. E. Dudeck and C. H. Peacock report on an experiment conducted to evaluate the performance 
of several cool-season grasses for winter overseeding of golf greens in northern Florida. One 
of the variables of interest was the distance that a golf ball would roll on a green after being 
rolled down a ramp (used to induce a constant initial velocity to the ball). Because the distance 
that the ball would roll was influenced by the slope of the green and the direction in which the 
grass was mowed, the experiment was set up in a randomized block design. The blocks were 
determined so that the slopes of the individual plots were constant within blocks (a transit was 
used to ensure accuracy), and all plots were mowed in the same direction and at the same height 
to eliminate mowing effects. The base grass was “Tiftgreen” Bermuda grass in a semidormant 
state. The same method of seeding and rates of application were used for all the ryegrasses that 
are represented in the following table of data. Measurements are average distances (in meters) 
from the base of the ramp to the stopping points for five balls rolled down the ramp and directly 
up the slope on each plot. Cultivars used in the study included A (Pennfine ryegrass), B (Dasher 
ryegrass), C (Regal ryegrass), D (Marvelgreen supreme), and E (Barry ryegrass). The grasses 
were planted within blocks and yielded the measurements shown.’ 


Variety 
Block A B C D E Total 


2.764 2.568 2.506 2.612 2.238 12.688 
3.043 2.977 2.533 2.675 2.616 13.844 
2.600 2.183 2.334 2.164 2.127 11.408 
4 3.049 3.028 2.895 2.724 2.607 14.393 


Total 11.456 10.756 10.268 10.175 9.678 52.333 


омго — 


a Perform the appropriate ANOVA to test for sufficient evidence to indicate that the mean 
distance of ball roll differs for the five cultivars. Give bounds for the attained significance 
level. What would you conclude at the o — .01 level of significance? 


b Is there evidence of a significant difference between the blocks used in the experiment? 
Test using œ = .05. 


Refer to Exercise 13.31. Suppose that we now find out that the 16 experimental units were 
obtained in the following manner. One sample was taken from each of four locations, each 
individual sample was split into four parts, and then each method was applied to exactly one 
part from each location (with the proper randomization). The data are now presented more 
correctly in the form shown in the accompanying table. Does this new information suggest a 


7. Source: A. E. Dudeck and C. H. Peacock, "Effects of Several Overseeded Ryegrasses on Turf Quality, 
Traffic Tolerance and Ball Roll,” Proceedings of the Fourth International Turfgrass Research Conference, 
R. W. Sheard, ed., pp. 75-81. Ontario Agricultural College, University of Guelph, Guelph, Ontario, and 
the International Turfgrass Society, 1981. 
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more appropriate method of analysis than that used in Exercise 13.31? If so, perform the new 
analysis and answer the question in Exercise 13.31(a). Is this new information worthwhile? 


Location Method!  Method2 Method3 Method 4 


I 3 2 5 5 
II 2 2 2 2 
IH 1 4 5 4 
IV 2 4 1 2 


Suppose that a randomized block design with b blocks and k treatments has each treatment mea- 
sured twice in each block. Indicate how you would perform the computations for an ANOVA. 


An evaluation of diffusion bonding of zircaloy components is performed. The main objective 
is to determine which of three elements—nickel, iron, or copper—is the best bonding agent. 
A series of zircaloy components are bonded using each of the possible bonding agents. Due 
to significant variation in components machined from different ingots, a randomized block 
design is used, blocking on the ingots. Two components from each ingot are bonded together 
using each of the three agents, and the pressure (in units of 1000 pounds per square inch) 
required to separate the bonded components is measured. The data shown in the following 
table are obtained. Is there evidence of a difference in mean pressures required to separate the 
components among the three bonding agents? Use a = .05. 


Bonding Agent 
Ingot Nickel Поп Copper 


1 67.0 71.9 72.2 
67.5 68.8 66.4 
76.0 82.6 74.5 
72.7 78.1 67.3 
73.1 74.2 73.2 
65.8 70.8 68.7 
75.6 84.9 69.0 


- о л & шо о 


From time to time, one branch office of a company must make shipments to another branch 
office in another state. Three package-delivery services operate between the two cities where 
the branch offices are located. Because the price structures for the three delivery services are 
quite similar, the company wants to compare the delivery times. The company plans to make 
several different types of shipments to its branch office. To compare the carriers, the company 
sends each shipment in triplicate, one with each carrier. The results listed in the accompanying 
table are the delivery times in hours. 


Carrier 
Shipment I II Ш 


15.2. 16.9. 17.1 
143 164 16.1 
147 15.9 15.7 
15.1 16.7 17.0 
14.0 15.6 15.5 


лов — 


a Is there evidence of a difference in mean delivery times among the three carriers? Give 
bounds for the attained significance level. 


b Why was the experiment conducted using a randomized block design? 
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Refer to the model for the randomized block design presented in Section 13.8. 


a Derive E(MST). 
b Derive E(MSB). 
c Derive E(MSE). 


Notice that these quantities appear in the F statistics used to test for differences in the mean 
response among the blocks and among the treatments. 


Estimation in the Randomized 
Block Design 


The confidence interval for the difference between a pair of treatment means in a 
randomized block design is completely analogous to that associated with the com- 
pletely randomized design (a special case of the one-way layout) in Section 13.7. A 
100(1 — w)% confidence interval for т; — T; is 


Е - 2 
(Yi. == Yi.) ae Te 


where n; = п = b, the number of observations contained in a treatment mean, 
and 5 = ~ MSE. The difference between the confidence intervals for the completely 
randomized and the randomized block designs is that the value fy; is based on 
v = n—b-— k+ 1 = (b — 1)(k — 1) df and that S, appearing in the preceding 
expression, is obtained from the ANOVA table associated with the randomized block 
design. 


EXAMPLE 13.6 


Solution 


Construct a 95% confidence interval for the difference between the mean responses 
for treatments 1 and 2, Example 13.5. 


The confidence interval for the difference in mean responses for a pair of treatments is 


2 - 2 
(Yi, = Yie) £ a/2S b’ 


where for our example f o» is based on 6 df. For treatments 1 and 2, we have 


2 
(.98 — 2.63) + олат 2, ог — 1.65 + .47 = (2.12, – 1.18). 


Thus, at the 95% confidence level we conclude that ће mean reaction time to stim- 
ulus 1 is between 1.18 and 2.12 seconds shorter than the mean reaction time to 
stimulus 2. E 
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Exercises 


Refer to Exercises 13.41 and 12.10. Find a 95% confidence interval for the difference in mean 
CPU times required for the two computers to complete a job. How does your answer compare 
to that obtained in Exercise 12.10(c)? 


Refer to Exercise 13.42. Construct a 95% confidence interval for the difference between mean 
resistances for chemicals A and B. 


Refer to Exercise 13.45. Construct a 90% confidence interval for the differences in mean growth 
for methods A and B. 


Refer to Exercise 13.46. Construct a 95% confidence interval for the difference in the mean 
distance of roll when Dasher ryegrass and Marvelgreen supreme are used for overseeding. 


Refer to Exercise 13.47. Construct a 95% confidence interval for the difference between the 
mean amounts of oil extracted by methods 1 and 4. Compare the answer to that obtained in 
Exercise 13.31(b). 


Refer to Exercise 13.49. Estimate the difference in mean pressures to separate components that 
are bonded with nickel and iron, using a 99% confidence interval. 


Selecting the Sample Size 


The method for selecting the sample size for the one-way layout (including the com- 
pletely randomized) or the randomized block design is an extension of the proce- 
dures of Section 8.7. We confine our attention to the case of equal sample sizes, 
пу = m =--- = nx, for the treatments of the one-way layout. The number of ob- 
servations per treatment is equal to the number of blocks b for the randomized block 
design. Thus, the problem is to determine n, or b for these two designs so that the 
resulting experiment contains the desired amount of information. 

The determination of sample sizes follows a similar procedure for both designs; 
we outline a general method. First, the experimenter must decide on the parameter 
(or parameters) of major interest. Usually, this involves comparing a pair of treatment 
means. Second, the experimenter must specify a bound on the error of estimation 
that can be tolerated. Once this has been determined, the next task is to select n; 
(the size of the sample from population or treatment i) or, correspondingly, b (the 
number of blocks for a randomized block design) that will reduce the half-width of 
the confidence interval for the parameter so that, at a prescribed confidence level, 
it is less than or equal to the specified bound on the error of estimation. It should 
be emphasized that the sample size solution always will be an approximation be- 
cause o is unknown and an estimate for o is unknown until the sample is acquired. 
The best available estimate for o will be used to produce an approximate solution. 
We illustrate the procedure with an example. 


EXAMPLE 13.7 


A completely randomized design is to be conducted to compare five teaching tech- 
niques in classes of equal size. Estimation of the differences in mean response on 
an achievement test is desired correct to within 30 test-score points, with probability 


Solution 
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equal to .95. It is expected that the test scores for a given teaching technique will pos- 
sess arange approximately equal to 240. Find the approximate number of observations 
required for each sample in order to acquire the specified information. 


The confidence interval for the difference between a pair of treatment means is 


= = |1 1 
(Vie — Yie) E taps epe 
ni nj 


Therefore, we wish to select n; and n; so that 


1 1 
faj28 | — + — < 30. 
nij п 


The value of o is unknown, and $ is a random variable. However, an approximate 
solution for п; = n; can be obtained by conjecturing that the observed value of s 
will be roughly equal to one-fourth of the range. Thus, s ~ 240/4 = 60. The value 
Of уз will be based on (n, 4- n» +----+ns5 — 5) df, and for even moderate values of 
nj, 1025 Will approximately equal 2. Then 


1 1 2 
10255 | — + — ~ (2)(60), | — = 30, 
nj nj ni 


or 


EXAMPLE 13.8 


Solution 


An experiment is to be conducted to compare the toxic effects of three chemicals on 
the skin of rats. The resistance to the chemicals was expected to vary substantially 
from rat to rat. Therefore, all three chemicals were to be tested on each rat, thereby 
blocking out rat-to-rat differences. 

The standard deviation of the experimental error was unknown, but prior exper- 
imentation involving several applications of a similar chemical on the same type of 
rat suggested a range of response measurements equal to 5 units. 

Find a value for b such that the error of estimating the difference between a pair 
of treatment means is less than 1 unit, with probability equal to .95. 


A very approximate value for s is one-fourth of the range, or s ~ 1.25. Then, we 


wish to select Б so that 
t 2554 | l + z = {02 \ / < 1 
025$ } } 0255 [ . 


Because f 025 will depend on the degrees of freedom associated with 52, which will 
be (n — b — k + 1), we will use the approximation f o»5 ^ 2. Then, 


2 
Qa25/? - 1. or b^ 13. 
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Approximately thirteen rats will be required to obtain the desired information. Since 
we will make three observations (k — 3) per rat, our experiment will require that a 
total of n = bk = 13(3) = 39 measurements be made. 

The degrees of freedom associated with the resulting estimate s? will be (n — b — 
k+ 1) = 39 — 13 — 3 + 1 = 24, based on this solution. Therefore, the guessed value 
of t would seem to be adequate for this approximate solution. B 
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The sample size solutions for Examples 13.7 and 13.8 are very approximate and 
are intended to provide only a rough estimate of sample size and consequent costs of 
the experiment. The actual lengths of the resulting confidence intervals will depend 
on the data actually observed. These intervals may not have the exact lengths specified 
by the experimenter but will have the required confidence coefficient. If the resulting 
intervals are still too long, the experimenter can obtain information on o as the 
data are being collected and can recalculate a better approximation to the number of 
observations per treatment (n; or b) as the experiment proceeds. 


Exercises 


Refer to Exercise 13.9. 


a About how many specimens per concrete mix should be prepared to allow estimation of 
the difference in mean strengths for a preselected pair of specimen types to within .02 ton 
per square inch? Assume knowledge of the data given in Exercise 13.9. 


b What is the total number of observations required in the entire experiment? 

Refer to Exercises 13.10 and 13.27(a). Approximately how many observations would be nec- 
essary to estimate ua to within 10 units? Use a 95% confidence coefficient. 

Refer to Exercises 13.10 and 13.27(c). 


a Assuming equal sample sizes for each treatment, approximately how many observations 
from method А and method B are necessary to estimate ид — ив to within 20 units? Use 
a 95% confidence coefficient. 


b What is the total number of observations required in the entire experiment? 
Refer to Exercise 13.45. 


a How many locations need to be used to estimate the difference between the mean growth 
for any two specified soil preparations to within 1 unit, with confidence coefficient .95? 


b What is the total number of observations required in the entire experiment? 


Refer to Exercises 13.47 and 13.55. How many locations should be used if it is desired to 
estimate шу — u4 to within .5 unit, with confidence coefficient .95? 


Simultaneous Confidence Intervals 
for More Than One Parameter 


The methods of Section 13.7 can be used to construct 100(1 — œ)% confidence 
intervals for a single treatment mean or for the difference between a pair of treatment 
means in a one-way layout. Suppose that in the course of an analysis we wish to 
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construct several of these confidence intervals. The method of Section 13.10 can be 
used to compare a pair of treatment means in a randomized block design. Although it 
is true that each interval will enclose the estimated parameter with probability (1— о), 
what is the probability that a// the intervals will enclose their respective parameters? 
The objective of this section is to present a procedure for forming sets of confidence 
intervals so that the simultaneous confidence coefficient is no smaller than (1 — o) 
for any specified value of a. 

Suppose that we want to find confidence intervals Л, h, ..., Im for parameters 
01,05, ..., Om so that 


P(0; € I; ога ј = 1, 2,...,т) z1— a. 
This goal can be achieved by using a simple probability inequality, known as the 


Bonferroni inequality (recall Exercise 2.104). For any events Aj, A», ..., Am, we 
have 


А: ПАП: 1 Ан = А U A2 О... ОА». 
Therefore, 
Р(А\П Ag Am) = 1 — P(A, ЧАО... ОА). 


Also, from the additive law of probability, we know that 


P(A U As UU Am) < XO Pap. 
j=l 


Hence, we obtain the Bonferroni inequality 


P(Ai Aa D Am) = 1 У?Р(А)). 
j=l 


Suppose that P(0; € Ij) = 1 — а; and let A; denote the event (0; € /;}. Then, 
P(0 € h,.... Omn € Im) > 1— SPO; #1) =1- Уа. 
j=l j=l 


Ifalla;’s, for j = 1, 2, ..., m, are chosen equal to o, we can see that the simultaneous 
confidence coefficient of the intervals /;, for j = 1,2,...,m, could be as small as 
(1 — ma), which is smaller than (1 — o) if т > 1. A simultaneous confidence 
coefficient of at least (1 — œ) can be ensured by choosing the confidence intervals 7 ;, 
for j = 1,2,..., m, so that ? 7 ро; = о. One way to achieve this objective is if 
each interval is constructed to have confidence coefficient 1 — (е /т). We apply this 
technique in the following example. 


EXAMPLE 13.9 


For the four treatments given in Example 13.2, construct confidence intervals for all 
comparisons of the form u; — ui’, with simultaneous confidence coefficient no smaller 
than .95. 
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The appropriate 100(1 — œ)% confidence interval for a single comparison (say, 


Ші — Шо) 15 
— Е 1 1 
(Vie — Уо) 25 — + —. 
n, n 


Because there are six such differences to consider, each interval should have confi- 
dence coefficient 1 — (0/6). Thus, the corresponding t-value is t,/5(6) = ta/12. Because 
we want simultaneous confidence coefficient at least .95, the appropriate f-value is 
105/12 = t.00417- Using Table 5, Appendix 3, the closest available table value is f 005, so 
we will use this to approximate the desired result. The MSE for the data in Example 
13.2 is based on 19 df, so the table value is £995 = 2.861. 

Because s = /MSE = 4/63 = 7.937, the interval for ші — шо among the six 
with simultaneous confidence coefficient at least .95 is 


1 1 
Ші = u2: (75.67 — 78.43) + 2.861(7.937) 6 += or —2.76+ 12.63. 


7 
Analogously, the entire set of six realized intervals are 
шл — pa: —2.76 + 12.63 
Ш = Из: 4.84 + 13.11 
Ш = H4: —12.08 + 14.66 
H2 — Из: 7.60 + 12.63 
H2 = Ha: —9.32 + 14.23 
из = H4:  —16.92 + 14.66. 


We cannot achieve our objective of obtaining а set of six confidence intervals with 
simultaneous confidence coefficient at least .95 because the f tables in the text are 
too limited. Of course, more extensive tables of the f distributions are available. 
Because each of our six intervals has confidence coefficient .99, we can claim that 
the six intervals above have a simultaneous confidence coefficient of at least .94. The 
applet Student's t Probabilities and Quantiles, applied with 19 df, yields г ооли = 
2.9435. Intervals with simultaneous confidence coefficient .9499 can be obtained by 
substituting f 00417 = 2.9435 in place of 2.861 in the above calculations. [e 


13.63 
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We emphasize that the technique presented in this section guarantees simultaneous 
coverage probabilities of at least 1 — o. The actual simultaneous coverage probability 
can be much larger than the nominal value 1 — o. Other methods for constructing 
simultaneous confidence intervals can be found in the books listed in the references 
at the end of the chapter. 


Exercises 


Refer to Example 13.9. The six confidence intervals for ш; — u; were obtained by using an 
approximate (due to the limitation of the information in Table 5, Appendix 3) value for f олт. 
Why do some of the intervals differ in length? 


Refer to Exercise 13.63 and Example 13.9. 
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a Use the exact value for too4ı7 given in Example 13.9 to give а 99.166% interval for 
Hı = ио. This interval is one of the six simultaneous intervals for ш; — p; with simultaneous 
confidence coefficient no smaller than .94996 ~ .95. 

b What is the ratio of the lengths of the intervals for шу — и» obtained in Example 13.9 and 
part (a)? 

с How does the ratio you obtained in part (b) compare to the ratio £005/1 00417? 

d Based on parts (b) and (c) and the interval for ш — из given in Example 13.9, give a 
99.166% interval for шу — из. As before, this is one of the six simultaneous intervals to 
compare џи; and u; with simultaneous confidence coefficient no smaller than .94996 ~ .95. 


Refer to Exercise 13.13. Construct confidence intervals for all possible differences between 
mean maneuver times for the three vehicle classes so that the simultaneous confidence coeffi- 
cient is at least .95. Interpret the results. 


Refer to Exercise 13.12. After looking at the data, a reader of the report of Wheeler et al. 
noticed that the largest difference between sample means occurs when comparing high and 
low concentrations of acetonitrile. If a confidence interval for the difference in corresponding 
population means is desired, how would you suggest constructing this interval? 


Refer to Exercise 13.45. Construct confidence intervals for all possible differences among 
treatment (soil preparation) means so that the simultaneous confidence coefficient is at 
least .90. 


Refer to Exercises 13.31 and 13.47. Because method 4 is the most expensive, it is desired to 
compare it to the other three. Construct confidence intervals for the differences шу — 14, |» — Ш, 
and из — ша so that the simultaneous confidence coefficient is at least .95. 


Analysis of Variance Using Linear Models 


The methods for analyzing linear models presented in Chapter 11 can be adapted for 
use in the ANOVA. We illustrate the method by formulating a linear model for data 
obtained through a completely randomized design involving k — 2 treatments. 

Let Y;; denote the random variable to be observed on the jth observation from 
treatment i, for i = 1, 2. Let us define a dummy, or indicator, variable x as follows: 


| 1, if the observation is from population 1, 


0, otherwise. 


Although such dummy variables can be defined in many ways, this definition is con- 
sistent with the coding used in SAS and other statistical analysis computer programs. 
Notice that with this coding x is 1 if the observation is taken from population 1 and 
x is 0 if the observation is taken from population 2. If we use x as an independent 
variable in a linear model, we can model Y;; as 


Ү = Bo + Bix + eij, 
where ej; is a normally distributed random error with E(¢;;) = О and V(¢;) = a, 


In this model, 


ш = E(Yij) = Bo + 81(1) = bo + Bi, 
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and 


u2 = E(Yoj) = Bo + Bi) = Во. 


Thus, it follows that 6; = ш — u2 and a test of the hypothesis ш — u2 = 0 is 
equivalent to the test that 6; = 0. Our intuition would suggest that до = Y», and 
By = Yi. — Y are good estimators of Во and f; indeed, it can be shown (proof 
omitted) that these are the least-squares estimators obtained by fitting the preceding 
linear model. We illustrate the use of this technique through reanalyzing the data 
presented in Example 13.1. 


EXAMPLE 13.10 Fit an appropriate linear model to the data of Example 13.1 and test to see whether 
there is a significant difference between 44, and u2. 


Solution The model, as indicated earlier, is given by 
Yi; = Bo + Bix + eij, 
where 
1, if the observation is from population 1, 
= k otherwise. 
The matrices used for the least-squares estimators are then 


6.1 1 
Tl 


=з 


Y= И : x= 


cooocoocooorRrR rR re 


ee ati) 1/6 =1/6 
хх= |$ 4 (X3) = | 


-1/6 1/3 


91.9] [8.033 
4359 | 25 ] 
Notice that By = 8.033 = Y», and Й = —.75 = Y, — Y».. 

Further, 


The least-squares estimates are given by 


1 
1 
1 
1 
1 
1 
1 
8.6 1 
1 
1 
1 
=1/6 173 | 


= ххх | 1/6 p 


SSE = ҮҮ —@Х'Ү = 5.8617 


is the same as the SSE calculated in Example 13.1. Therefore, s? = SSE/(n — 2) = 
58617, and s = у .58617 = .7656. 
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To test Ho : 81 = 0, we construct the ¢ statistic (see Section 11.12): 


‚_й-0__-—175__ 
С scn .7656/1/3 — 


Because we are interested in a two-tailed test, the associated p-value is 2P(t < 
—1.697) = 2P (t > 1.697), where t is based on 10 df. Thus, using Table 5, Appendix 
3, we obtain .05 < P(t > 1.697) < .10 and .10 < p-value < .20. Therefore, for 
any a-value less than .1, we cannot reject Ho. That is, there is insufficient evidence 
to indicate that и and иә differ. 

This ¢ test is equivalent to the F test of Example 13.1. In fact, the square of the 
observed t-value is the observed F-value of Example 13.1. BH 


1.697. 


We illustrate the linear model approach to a more complicated analysis of variance 
problem by considering a randomized block design. 


EXAMPLE 13.11 


Solution 


An experiment was conducted to compare the effects of four chemicals A, B, C, and 
D on water resistance in textiles. Three different bolts of material I, IL, and III were 
used, with each chemical treatment being applied to one piece of material cut from 
each of the bolts. The data are given in Table 13.7. Write a linear model for this 
experiment and test the hypothesis that there are no differences among mean water 
resistances for the four chemicals. Use o — .05. 


In formulating the model, we define Во as the mean response for treatment D on 
material from bolt IIL, and then we introduce a distinct indicator variable for each 
treatment and for each bolt of material (block). The model is 


Y = Bo + Bixi + Вх + 3x3 + Ваха + Bsxs + €, 
where 


if material from bolt I is used, 
otherwise, 

if material from bolt П is used, 
otherwise, 


if treatment A is used, 


otherwise, 
Table 13.7 Data for Example 13.11 
Treatments 
Bolt of Material A B C D 
I 10.1 11.4 99 12.1 
II 12.2 129 123 13.4 
Ш 11.9 127 114 12.9 
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1, if treatment B is used, 
Е | 0, otherwise, 
1, if treatment C is used, 
= n otherwise. 
We want to test the hypothesis that there are no differences among treatment means, 
which is equivalent to Ho: 63 = f4 = fi = 0. Thus, we must fit a complete and а 


reduced model. (See Section 11.14.) 
For the complete model, we have 


10.1 
12.2 
11.9 


Y= ` and X= 


ooco-oco-oo-ooc- 
oOo-ocoocooccococ 
ooocoooooo--- 
oooooo-—-—-ooc 
ocooo-c-—-ooooooc 


12.9 


A little matrix algebra yields, for this complete model, 
SSEc = ҮҮ — BX'Y = 1721.760 — 1721.225 = .535. 
The relevant reduced model is 
Y = Bo + Bixi + В + €, 


and the corresponding X matrix consists of only the first three columns of the X 
matrix given for the complete model. We then obtain 


| 12.225 
B = (ХХ) !Х'Ү = | —1.350 
А75 


апа 
SSEr = ҮҮ — BX'Y = 1721.760 — 1716.025 = 5.735. 


It follows that the F ratio appropriate to compare these complete and reduced 
models is 


_ (55Ек = SSEc)/(k — 8) _ (5.735—.535)/(5—2) _ 1733 _ 


SSEc/(n — [k + 1]) (.535)/(12—6) |  .0892 


The tabulated F fora = .05, v, = 3, and v; = 6 is 4.76. Hence, if we choose 
a = .05, we reject the null hypothesis and conclude that the data present sufficient 
evidence to indicate that differences exist among the treatment means. The associated 
p-value is given by P(F > 19.4). Table 7, Appendix 3, establishes that p-value — 
.005. The applet F-Ratio Probabilities and Quantiles, applied with 3 numerator and 


19.4. 


13.14 Summary 705 


6 denominator degrees of freedom yields p-value = P(F > 19.4) = .00172. The F 
test used in this example is equivalent to the one that would have been produced by 
the methods discussed in Section 13.9. О 
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Although it provides a very useful technique, the linear model approach to ANOVA 
calculation generally is used only when the computations are being done on a com- 
puter. The calculation formulas given earlier in the chapter are more convenient for 
hand calculation. Notice that if there are k treatments involved in a study, the “dummy 
variables" approach requires that we define k — 1 dummy variables if we wish to use 
the linear model approach to analyze the data. 


Exercises 


Refer to Example 13.11. In Exercise 13.37, you interpreted the parameters in the model for a 
randomized block design in terms of the mean response for each treatment in each block. In 
terms of the model with dummy variables given in Example 13.11, Во is the mean response to 
treatment D for bolt of material (block) III. 


a Interms of the 6-values, what is the mean response to treatment А in block III? 


b Based on your answer to part (a), what is an interpretation of the parameter В? 
Refer to Exercise 13.10. 


a Answerthe question posed in Exercise 13.10 by fitting complete and reduced linear models. 
Test using о = .05. 

b Use the calculations for the complete model from part (a) to test the hypothesis that there 
is no difference between the means for methods A and C. Test using о = .05. 


с Give the attained significance levels for the tests implemented in parts (a) and (b). 
Refer to Exercise 13.42. Answer part (a) by fitting complete and reduced models. 


Refer to Exercise 13.45. Answer part (b) by constructing an F test, using complete and reduced 
linear models. 


Summary 


The one-way layout (including the completely randomized design) and the random- 
ized block design are examples of experiments involving one and two qualitative 
independent variables, respectively. The ANOVA partitions the total sum of squares, 
Total SS, into portions associated with each independent variable and with experi- 
mental error. Mean squares associated with each independent variable may be com- 
pared with MSE, to see whether the mean squares are large enough to imply that 
the independent variable has an effect on the response. Confidence intervals for the 
mean response to an individual treatment or the difference in mean responses for 
two preselected treatments are straightforward modifications of intervals presented in 
previous chapters. The Bonferroni inequality was used to construct a set of confidence 
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intervals with simultaneous confidence coefficient at least | —o'. Finally, we introduced 
the dummy variable approach that permits the use of linear models methodology to 
implement an analysis of variance. 

In this chapter, we have presented a very brief introduction to the analysis of 
variance and its associated subject, the design of experiments. Experiments can be 
designed to investigate the effect of many quantitative and qualitative variables on 
a response. These may be variables of primary interest to the experimenter, as well 
as nuisance variables such as blocks, which may contribute unwanted variation that 
we attempt to separate from the experimental error. When properly designed, such 
experiments yield data that can be analyzed using an ANOVA approach. A more 
extensive coverage of the basic concepts of experimental design and the analysis of 
experiments is found in the references. 
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Supplementary Exercises 


Assume that п = bk experimental units are available for use in an experiment used to compare 
k treatments. If blocks can be formed in a meaningful way, how should the experimental units 
in each block be identified? 


Refer to Exercise 13.73. 

a Ifacompletely randomized design is employed, how would you select the experimental 
units that are assigned to the different treatments? 

b Ifa randomized block design is employed, how would you select the experimental units 


that are assigned to each of the k treatments? 


Three skin cleansing agents were used on three persons. For each person, three patches of skin 
were exposed to a contaminant and afterward cleansed by using one of the three cleansing 
agents. After 8 hours, the residual contaminant was measured, with the following results: 


SST = 1.18, SSB = .78, SSE = 2.24. 
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а What are the experimental units, and what are the blocks in this experiment? 


b Test the hypothesis that there are no differences among the treatment means, using о = .05. 


Refer to Exercise 13.9. Suppose that the sand used in the mixes for samples 1—4 came from 
pit A, the sand used for samples 5-8 came from pit B, and the sand for samples 9-12 came 
from pit C. Analyze the data, assuming that the requirements for a randomized block are met 
with three blocks consisting, respectively, of samples 1, 2, 3, and 4; samples 5, 6, 7, and 8; and 
samples 9, 10, 11, and 12. 


a At the 5% significance level, is there evidence of differences in concrete strength due to 
the sand used? 


b Is there evidence, at the 5% significance level, of differences in average strength among 
the four types of concrete used? 


c Does the conclusion of part (b) contradict the conclusion that was obtained in Exercise 
13.9? 


Refer to Exercise 13.76. Let wa and ив, respectively, denote the mean strengths of concrete 
specimens prepared from mix A and mix B. 


a Find a 95% confidence interval for (шд — Ив). 


b Is the interval found in part (a) the same interval found in Exercise 13.26(b)? Why or 
why not? 


A study was initiated to investigate the effect of two drugs, administered simultaneously, on 
reducing human blood pressure. It was decided to use three levels of each drug and to include 
all nine combinations in the experiment. Nine high-blood-pressure patients were selected for 
the experiment, and one was randomly assigned to each of the nine drug combinations. The 
response observed was a drop in blood pressure over a fixed interval of time. 


a Isthis arandomized block design? 


b Suppose that two patients were randomly assigned to each of the nine drug combinations. 
What type of experimental design is this? 


Refer to Exercise 13.78. Suppose that a balanced completely randomized design is to be 
employed and that prior experimentation suggests that o. = 20. 


a How many replications would be required to estimate any treatment (drug combination) 
mean correct to within +10 with probability .95? 

b How many degrees of freedom will be available for estimating o? when using the number 
of replications determined in part (a)? 

с Give the approximate half-width of a 95% confidence interval for the difference in mean 
responses for two treatments when using the number of replications determined in part (a). 


A dealer has in stock three cars (models A, B, and C) of the same make but different models. 
Wishing to compare mileage obtained for these different models, a customer arranged to test 
each car with each of three brands of gasoline (brands X, Y, and Z). In each trial, a gallon of 
gasoline was added to an empty tank, and the car was driven without stopping until it ran out of 
gasoline. The accompanying table shows the number of miles covered in each of the nine trials. 


Distance (miles) 
Brand of Gasoline Model A Model B Model C 
X 22.4 17.0 19.2 


Y 20.8 19.4 20.2 
Z 21.5 18.7 21.2 
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a Should the customer conclude that the different car models differ in mean gas mileage? 
Test at the о = .05 level. 


b Do the data indicate that the brand of gasoline affects gas mileage? 


Refer to Exercise 13.80. Suppose that the gas mileage is unrelated to the brand of gasoline. Carry 
out an analysis of the data appropriate for a completely randomized design with three treatments. 


a Should the customer conclude that the three cars differ in gas mileage? Test at the a = .05 
level. 


b Comparing your answer for Exercise 13.80(a) with your answer for part (a), can you suggest 
a reason why blocking may be unwise in certain cases? 


c Why might it be Wrong to analyze the data in the manner suggested in part (a)? 


In the hope of attracting more riders, a city transit company plans to have express bus service 
from a suburban terminal to the downtown business district. These buses should save travel 
time. The city decides to perform a study of the effect of four different plans (such as a special 
bus lane and traffic signal progression) on the travel time for the buses. Travel times (in min- 
utes) are measured for several weekdays during a morning rush-hour trip while each plan is in 
effect. The results are recorded in the following table. 


Plan 


a What type of experimental design was employed? 
Is there evidence of a difference in the mean travel times for the four plans? Use о = 0.01. 


Form a 95% confidence interval for the difference between plan 1 (express lane) and plan 
3 (a control: no special travel arrangements). 


A study was conducted to compare the effect of three levels of digitalis on the level of calcium 
in the heart muscle of dogs. A description of the actual experimental procedure is omitted, but 
itis sufficient to note that the general level of calcium uptake varies from one animal to another 
so that comparison of digitalis levels (treatments) had to be blocked on heart muscles. That is, 
the tissue for a heart muscle was regarded as a block and comparisons of the three treatments 
were made within a given muscle. The calcium uptakes for the three levels of digitalis, A, B, 
and C, were compared based on the heart muscles of four dogs. The results are shown in the 
accompanying table. 


Dogs 
1 2 3 4 


A C B A 
1342 1698 1296 1150 


B B A C 
1608 1387 1029 1579 


C A C B 
1881 1140 1549 1319 


13.84 


13.85 


13.86 


Supplementary Exercises 709 


a Calculate the sums of squares for this experiment and construct an ANOVA table. 
b How many degrees of freedom are associated with SSE? 


с Dothe data present sufficient evidence to indicate a difference in the mean uptake of calcium 
for the three levels of digitalis? 


d Do the data indicate a difference in the mean uptake in calcium for the heart muscles of 
the four dogs? 


e Give the standard deviation of the difference between the mean calcium uptakes for two 
levels of digitalis. 


f Find a 95% confidence interval for the difference in mean responses between treatments 
A and B. 


Refer to Exercise 13.83. Approximately how many replications are required for each level of 
digitalis (how many blocks) so that the error of estimating the difference in mean response for a 
pair of digitalis levels is less than 20, with probability .95? Assume that additional observations 
would be made within a randomized block design. 


A completely randomized design was conducted to compare the effects of five stimuli on reac- 
tion time. Twenty-seven people were employed in the experiment, which was conducted using a 
completely randomized design. Regardless of the results of the ANOVA, it is desired to compare 
stimuli A and D. The reaction times (in seconds) were as shown in the accompanying table. 


Stimulus 
A B С р Е 
8 7 1.2 1.0 6 
6 8 1.0 .9 A 
6 5 .9 .9 4 
5 5 1.2 1.1 7 
6 1.3 3 
9 8 
7 
Total 25 АЛ 64 4.6 2.4 
Меап ‚625 .671 1.067 .920 .480 


a Conduct an ANOVA and test for a difference in mean reaction times due to the five stimuli. 
Give bounds for the p-value. 


b Compare stimuli A and D to see if there is a difference in mean reaction times. What can 
be said about the attained significance level? 


Because we would expect mean reaction time to vary from one person to another, the exper- 
iment in Exercise 13.85 might have been conducted more effectively by using a randomized 
block design with people as blocks. Hence, four people were used in a new experiment, and 
each person was subjected to each of the five stimuli in a random order. The reaction times (in 
seconds) were as shown in the accompanying table. Conduct an ANOVA and test for differences 
in mean reaction times for the four stimuli. 


Stimulus 
Subect A B С р Е 
1 i 8 10 10 5 
2 6 6 11 10 6 
3 9 10 12 11 .6 
4 6 8 9 10 4 
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Refer to Exercise 13.46. Construct confidence intervals to compare each of the ryegrass culti- 
vars with Marvelgreen supreme in such a way that the simultaneous confidence coefficient is 
at least .95. Interpret the results. 


Show that 
Total SS = SST + SSB + SSE 


for a randomized block design, where 


Consider the following model for the responses measured in a randomized block design con- 
taining b blocks and k treatments: 


Ү=и+т + В; + &ij, 


where Ү;; = response to treatment i in block /, 
u = overall mean, 
t; = nonrandom effect of treatment i, where 34 її = 0, 
В; = random effect of block j, where В; are independent, normally 
distributed random variables with E(8;) = 0 and V(8;) = оў, for 


J eS ub 
в; = random error terms where ¢;;’s are independent, normally distributed 
random variables with E(e;;) = 0 and V (e;;j) = oe, fori=1,2,...,k 


and у — 1,2,.. 5. 


Further, assume that the B;'s and ¢;;’s also are independent. This model differs from that pre- 
sented in Section 13.8 in that the block effects are assumed to be random variables instead of 
fixed but unknown constants. 


a Ifthe model just described is appropriate, show that observations taken from different 
blocks are independent of one another. That is, show that Y;; and Y;; are independent if 
j X j',as аге Y; and Yy; if i # i' and j # J’. 

b Under the model just described, derive the covariance of two observations from the same 
block. That is, find Cov(Y;;, Y, ;) if i # i’. 

c Two random variables that have a joint normal distribution are independent if and only if 
their covariance is 0. Use the result from part (b) to determine conditions under which two 
Observations from the same block are independent of one another. 


Refer to the model for the randomized block design with random block effect given in 
Exercise 13.89. 


a Give the expected value and variance of Y;;. 

b Let Y;, denote the average of all of the responses to treatment i. Use the model for the 
randomized block design to derive E (Yie) and V(Y;,). Is Y;, an unbiased estimator for the 
mean response to treatment i? Why or why not? Notice that V (Y;,) depends on b and both 
a; and o7. 

с Consider Y;, — Y;, fori + i’. Show that E(Y;, — У.) = т; — Ty. This result implies that 
Y;. — Yj. is an unbiased estimator of the difference in the effects of treatments i and i’. 

d Derive Vi — Үр). Notice that V(Yi — Ys) depends only on b and o2. 


Refer to the model for the randomized block design with random block effect given in 
Exercise 13.89 and let Y,; denote the average of all the responses in block j. Derive 
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E(Y.;) and У(У, ;). 
E(MST). 
E(MSB). 
E(MSE). 


aa C» 


*13.92 Refer to the model for the randomized block design with random block effect given in Exercise 
13.89 and the results obtained in Exercise 13.91(c) and (d). Give an unbiased estimator for 


2 


а oj. 
2 
b oj. 
*13.93 Suppose ibat Yi, Y», ..., Y, is a random sample from a normal distribution with mean и and 


variance o°. The independence оу? (Y; — Y)? and Y can be shown as follows. Define an 
n x n matrix А by 


1 1 1 1 1 _ 
Jn Jn мп уп Jn 
: = 0 0 0 
J2 V2 
1 1 -2 
0 
42.3 42.3. 42.3 


| 
| | 
- B 
| : : : Do: : | 
| 1 1 1 M^ 


Ja— Dn Jn — Dn ^ Ja- Dn Ja - Dn 
and notice that A'A = I, the identity matrix. Then, 

Sy = Y'Y = Y'A'AY, 

i=l 


where Y is the vector of Y; values. 


a Show that 
Үп 
Ui 
AY-| U2 
Un-1 
where U,, U2, ... , U,_, are linear functions of Y;, Y2,..., Y,. Thus, 
п—1 
Y Y? =nY + үз 
b Show that the linear functions Y Y n , Ui, U5, ..., Оі are pairwise orthogonal and hence 


independent under the normality assumption. (See Exercise 5.130.) 


с Show that 
n п—1 
У-Ү? = У 02 
i=l i=1 


and conclude that this quantity is independent of Y. 
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d Using the results of part (c), show that 


У (0 -Y? (п—1)$ 
о? Е о? 


has a x? distribution with (n — 1) df. 


13.94 Consider a one-way layout with k treatments. Assume that Y;; is the jth response for treat- 
ment (population) i and that Y;; is normally distributed with mean и; and variance о?, for 
{= 1,2,...,Капаў 1,2... 


a Use Exercise 13.93 to justify that Yı, Yo,..., Yr are independent of SSE. 


b Show that MST/MSE has an F distribution with v; = k — 1 and v = n; +n +--+ 
ny — k df under Но: ш = ил = ++: = pug. (You may assume, for simplicity, that 
nj; = ng = +++ = ny.) 
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A Description of the Experiment 


Many experiments result in measurements that are qualitative or categorical rather 
than quantitiative like many of the measurements discussed in previous chapters. In 
these instances, a quality or characteristic is identified for each experimental unit. 
Data associated with such measurements can be summarized by providing the count 
of the number of measurements that fall into each of the distinct categories associated 
with the variable. For example, 


Employees can be classified into one of five income brackets. 

Mice might react in one of three ways when subjected to a stimulus. 

Motor vehicles might fall into one of four vehicle types. 

Paintings could be classified into one of К categories according to style and 
period. 

The quality of surgical incisions could be most meaningfully be identified as 
excellent, very good, good, fair, or poor. 

Manufactured items are acceptable, seconds, or rejects. 


АП the preceding examples exhibit, to a reasonable degree of approximation, the 
following characteristics, which define a multinomial experiment (see Section 5.9): 
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1. The experiment consists of identical trials. 

2. The outcome of each trial falls into exactly one of k distinct categories or cells. 

3. The probability that the outcome of a single trial will fall in a particular cell, 
cell i, is p;, where i = 1,2,...,k, and remains the same from trial to trial. 
Notice that 


Pit pot рз ++ pe=l. 


4. The trials are independent. 

5. We are interested in n1, n2, n3, ..., ny, Where n; fori = 1, 2,..., К is equal 
to the number of trials for which the outcome falls into cell 7. Notice that 
nj + nod nace + п = п. 


This experiment is analogous to tossing п balls at k boxes, where each ball must 
fall into exactly one of the boxes. The probability that a ball will fall into a box varies 
from box to box but remains the same for each box in repeated tosses. Finally, the 
balls are tossed in such a way that the trials are independent. At the conclusion of 
the experiment, we observe n; balls in the first box, m2 in the second, ..., and ng in 
the kth. The total number of balls ism = пу + n» + n3 +--+ ng. 

Notice the similarity between the binomial and the multinomial experiments and, 
in particular, that the binomial experiment represents the special case for the multi- 
nomial experiment when k = 2. The two-cell probabilities, p and g = 1 — p, of the 
binomial experiment are replaced by the k-cell probabilities, p1, p2,..., Pk, of the 
multinomial experiment. The objective of this chapter is to make inferences about 
the cell probabilities pi, po, ..., рк. The inferences will be expressed in terms of 
statistical tests of hypotheses concerning the specific numerical values of the cell 
probabilites or their relationship one to another. 

Because the calculation of multinomial probabilities is somewhat cumbersome, 
it would be difficult to calculate the exact significance levels (probabilities of type I 
errors) for hypotheses regarding the values of pi, p2,..., py. Fortunately, we have 
been relieved of this chore by the British statistician Karl Pearson, who proposed a 
very useful test statistic for testing hypotheses concerning pi, p2, ..., pk and gave the 
approximate sampling distribution of this statistic. We will outline the construction 
of Pearson's test statistic in the following section. 


The Chi-Square Test 


Suppose that n = 100 balls were tossed at the cells (boxes) and that we knew that р 
was equal to .1. How many balls would be expected to fall into cell 1? Referring to 
Section 5.9, recall that nı has a (marginal) binomial distribution with parameters n 
and ру, and that 


E(ni) = np, = (100)(.1) = 10. 


In like manner, each of the n;'s have binomial distributions with parameters n and p; 
and the expected numbers falling into cell i is 


Е(п;) = при, i=1,2,...,k. 
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Now suppose that we hypothesize values for pi, po. ..., рк and calculate the 
expected value for each cell. Certainly if our hypothesis is true, the cell counts n; 
should not deviate greatly from their expected values np; fori = 1, 2,..., k. Hence, 
it would seem intuitively reasonable to use a test statistic involving the К deviations, 


n; — E (ni) = п; —npi, fori 2 12.....K. 


In 1900 Karl Pearson proposed the following test statistic, which is a function of the 
squares of the deviations of the observed counts from their expected values, weighted 
by the reciprocals of their expected values: 


Although the mathematical proof is beyond the scope of this text, it can be shown 
that when n is large, X? has an approximate chi-square (x?) probability distribution. 
We can easily demonstrate this result for the case k — 2, as follows. If k — 2, then 
п = n — n; and pı + рә = 1. Thus, 


x2 = У [n; — E(n))P _ (пу — пр)? + (n5 — пр»)? 


E(ni) npi np» 


i=1 


= (ni — пру)? [n = т) = n(1 — pı)? 


прі п(1 = pi) 
_ Qu -np)? , (-n + прі)? 
прі n(l— pi) 
1 1 (nı = пру)? 
= m =n ( )- | РІ 5 
np n(l— p) npy(d — pi) 


We have seen (Section 7.5) that for large n 
nı = прі 
мпр1(1 — pi) 

has approximately a standard normal distribution. Since the square of a standard 
normal random variable has a x? distribution (see Example 6.11), for k = 2 and large 
n, X? has an approximate x? distribution with 1 degree of freedom (df). 

Experience has shown that the cell counts n; should not be too small if the x? 
distribution is to provide an adequate approximation to the distribution of X?. As a 
rule of thumb, we will require that all expected cell counts are at least five, although 
Cochran (1952) has noted that this value can be as low as one for some situations. 

You will recall the use of the x? probability distribution for testing a hypothesis 
concerning a population variance o? in Section 10.9. In particular, we have seen that 
the shape of the x? distribution and the associated quantiles and tail areas differ con- 
siderably depending on the number of degrees of freedom (see Table 6, Appendix 3). 
Therefore, if we want to use X? as a test statistic, we must know the number of degrees 
of freedom associated with the approximating x? distribution and whether to use a 
one-tailed or two-tailed test in locating the rejection region for the test. The latter 
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problem may be solved directly. Because large differences between the observed and 
expected cell counts contradict the null hypothesis, we will reject the null hypothesis 
when X? is large and employ an upper-tailed statistical test. 

The determination of the appropriate number of degrees of freedom to be employed 
for the test can be a little tricky and therefore will be specified for the physical 
applications described in the following sections. In addition, we will state the principle 
involved (which is fundamental to the mathematical proof of the approximation) 
so that you will understand why the number of degrees of freedom changes with 
various applications. This principle states that the appropriate number of degrees of 


freedom will equal the number of cells, К, less 1 df for each independent linear restric- 


tion placed on the cell probabilities. For example, one linear restriction is always 
present because the sum of the cell probabilities must equal 1; that is, 


pittpoeBmm-l 


Other restrictions will be introduced for some applications because of the necessity 
for estimating unknown parameters required in the calculation of the expected cell 
frequencies or because of the method used to collect the sample. When unknown 
parameters must be estimated in order to compute X?, а maximum-likelihood esti- 
mator (MLE) should be employed. The degrees of freedom for the approximating x? 
distribution is reduced by 1 for each parameter estimated. These cases will arise as 
we consider various practical examples. 


A Test of a Hypothesis Concerning 
Specified Cell Probabilities: 
A Goodness-of-Fit Test 


The simplest hypothesis concerning the cell probabilities is one that specifies numer- 
ical values for each. In this case, we are testing Ho: ру = pio. P2 = P2.0;---, Pk = 
Pko, Where p;o denotes a specified value for p;. The alternative is the general 
one that states that at least one of the equalities does not hold. Because the only 
restriction on the cell probabilities is that RR pi — l,the X? test statistic has 
approximately a x? distribution with k — 1 df. 


EXAMPLE 14.1 


A group of rats, one by one, proceed down a ramp to one of three doors. We wish to 
test the hypothesis that the rats have no preference concerning the choice of a door. 
Thus, the appropriate null hypothesis is 


1 
3 , 
where p; is the probability that a rat will choose door i, for i = 1, 2, or 3. 
Suppose that the rats were sent down the ramp n — 90 times and that the three 


observed cell frequencies were n, = 23, n; = 36, and n3 = 31. The expected cell 
frequency are the same for each cell: E(n;) = np; = (90)(1/3) = 30. The observed 


Ao: pi = р = рз = 
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Solution 


Table 14.1 Observed and expected cell counts 


Door 


Value 1 2 3 


Observed cell frequency n; —23 т = 36 n3=31 
Expected cell frequency (30) (30) (30) 


and expected cell frequencies are presented in Table 14.1. Notice the discrepancy 
between the observed and expected cell frequencies. Do the data present sufficient 
evidence to warrant rejection of the hypothesis of no preference? 


The x? test statistic for our example will possess (k — 1) = 2 df since the only 
restriction on the cell probabilities is that 
pi pac ps = 1. 


Therefore, if we choose а = .05, we would reject the null hypothesis when X? > 
5.99] (see Table 6, Appendix 3). 
Substituting into the formula for X?, we obtain 


k c 2 
[nj m E (ni P (nj m A 
=) Еа) h ар 


isl 


| (23-30)? (36-30? (31 — 30)? 
Е 30 ш 30 " 30 


Because X? is less than the tabulated critical value of x”, the null hypothesis is not 
rejected, and we conclude that the data do not present sufficient evidence to indicate 
that the rats have a preference for any of the doors. In this case, the p-value is given 
by p-value = P(x? > 2.87), where x? possesses a x? distribution with k — 1 = 2 df. 
Using Table 6, Appendix 3, it follows that p-value > 0.10. The applet Chi-Square 
Probability and Quantiles gives p-value = P(x? > 2.87) = 23812. L| 


= 2.87. 


The x? statistic also can be used to test whether sample data indicate that a specific 
model for a population distribution does not fit the data. An example of such a test, 
called the goodness-of-fit test, is given in the following example. 


EXAMPLE 14.2 


Solution 


The number of accidents Y per week at an intersection was checked for n = 50 
weeks, with the results as shown in Table 14.2. Test the hypothesis that the random 
variable Y has a Poisson distribution, assuming the observations to be independent. 
Use a = .05. 


The null hypothesis Ho states that Y has the Poisson distribution, given by 


АУ —A 
р(у |) = y^ y=0,1,2,.... 
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Table 14.2 Data for Example 14.2 


y Frequency 
0 32 
1 12 
2 6 
3 or more 0 


Because A is unknown, we must find its MLE. In Exercise 9.80, we established that 
the MLE of A is Â = У. For the given data, А has the value y = 24/50 = .48. 

We have, for the given data, three cells with five or more observations—the cells 
defined by Y 20, Y = 1, and Y > 2. Under Но, the probabilities for these cells are 


pı = P(Y =0)= e>, рә = P(Y = 1) = Ае^*, 
рз = P(Y > 2)=1-—e%—)e™. 
These probabilities are estimated by replacing А with A, which gives 
pi =e = 619, po = 48e 5 = 297, Ёз = 1 — pi — po = .084. 


If the observations are independent, the cell frequencies л, n2, and из have a 
multinomial distribution with parameters рі, p», and рз. Thus, Е(п;) = np;, and the 
estimated expected cell frequencies are given by 


E(m) = np, = 30.95,  E(m)=np.= 14.85, — E(n) = np; = 420. 
Thus, the test statistic is given by 
2 [n; — Emi ур 
d Е) g Em) ^| 
which has approximately a x? distribution with (k — 2) = 1 df. (One degree of free- 


dom is lost because à had to be estimated, the other, because m pi = 1.) 
On computing X? we find 


2 (32— 30.95) (12 — 14.85)2 (6 — 4.20) 
|. 80.95 14.85 4.20 


Because Xis = 3.841, with 1 df, we do not reject Hp. The data do not present sufficient 
evidence to contradict our hypothesis that Y possesses a Poisson distribution. The 
p-value is given by P(x? > 1.354). Table 6, Appendix 3, gives p-value > .10 
whereas the applet Chi-Square Probability and Quantiles establishes that p-value = 
.24458. Unless a very large value of o is used (о > .24458), there is insufficient 
evidence to reject the claim that the number of accidents per week has a Poisson 
distribution. П 


= 1.354. 


14.1 


Exercises 


Historically, the proportions of all Caucasians in the United States with blood phenotypes A, 
B, AB, and O are .41, .10, .04, and .45, respectively. To determine whether current population 
proportions still match these historical values, a random sample of 200 American Caucasians 


14.2 


14.3 


14.4 


14.5 
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were selected, and their blood phenotypes were recorded. The observed numbers with each 
phenotype are given in the following table. 


A B АВ О 
89 18 12 81 


a Is there sufficient evidence, at the .05 level of significance, to claim that current proportions 
differ from the historic values? 

b Applet Exercise Use the applet Chi-Square Probability and Quantiles to find the p-value 
associated with the test in part (a). 


Previous enrollment records at a large university indicate that of the total number of persons 
who apply for admission, 60% are admitted unconditionally, 5% are conditionally admitted, 
and the remainder are refused admission. Of 500 applicants to date for next year, 329 were 
admitted unconditionally, 43 were conditionally admitted, and the remainder were not admitted. 
Do the data indicate a departure from previous admission rates? 


a Test using a = .05. 


b Applet Exercise Use the applet Chi-Square Probability and Quantiles to find the p-value 
associated with the test in part (a). 


A city expressway with four lanes in each direction was studied to see whether drivers preferred 
to drive on the inside lanes. A total of 1000 automobiles were observed during the heavy 
early-morning traffic, and their respective lanes were recorded. The results are shown in the 
accompanying table. Do the data present sufficient evidence to indicate that some lanes are 
preferred over others? (Test the hypothesis that p; = р» = ps = р = 1/4, using a = .05.) 
Give bounds for the associated p-value. 


Lane | 1 2 3 4 


Count | 294 276 238 192 


Do you hate Mondays? Researchers in Germany have provided another reason for you: They 
concluded that the risk of heart attack on a Monday for a working person may be as much as 
50% greater than on any other day.! The researchers kept track of heart attacks and coronary 
arrests over a period of 5 years among 330,000 people who lived near Augsberg, Germany. In an 
attempt to verify the researcher’s claim, 200 working people who had recently had heart attacks 
were surveyed. The day on which their heart attacks occurred appear in the following table. 


Sunday Monday Tuesday Wednesday Thursday Friday Saturday 
24 36 27 26 32 26 29 


Do these data present sufficient evidence to indicate that there is a difference in the percentages 
of heart attacks that occur on different days of the week? Test using a = .05. 


After inspecting the data in Exercise 14.4, you might wish to test the hypothesis that the 
probability that a heart attack victim suffered a heart attack on Monday is 1/7 against the 
alternative that this probability is greater than 1/7. 


1. Source: Daniel Q. Haney, “Mondays May Be Hazardous,” Press-Enterprise (Riverside, Calif.), 17 
November 1992, p. A16. 
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a Carry out the test above, using o = .05. 
What tenet of good statistical practice is violated in the test in part (a)? 


Prior to looking at the current data, is there a reason that you might legitimately consider 
the hypotheses from part (a)? 


Suppose that the assumptions associated with a multinomial experiment are all satisfied. Then 
(see Section 5.9) each of the n;’s,i = 1, 2, ..., k, have a binomial distribution with parameters 
n and p;. Further, Cov(n;, nj) = —np; pj if i % j. 

What is E(n; — nj)? 

Refer to part (a). Give an unbiased estimator for p; — pj. 

Show that V (n; — nj) = n[pi(1 — pi) + pj — pj) + 2pipjl. 

Refer to part (c). What is the variance of the unbiased estimator that you gave in part (b)? 
Give a consistent estimator for n^! V (n; — nj). 


oan 7» 


If п is large, the estimator that you gave in part (b) is approximately normally distributed 
with mean p; — p; and variance n? V (n; — nj). If f; = n;/n and f; = n;/n, show that 
a large sample (1 — a)100% confidence interval for p; — ру is given by 


Bil — pi) + Pj — pj) + 20:0; 
n 


bi — bj = T 


Refer to Exercise 14.3. Lane 1 is the “slow” land and lane 4 is the “fast” lane. Use the confidence 
interval formula given in Exercise 14.6(f) to give a 95% confidence interval for pı — p4. Would 
you conclude that a greater proportion drive in the slow lane than in the fast lane? Why? 


The Mendelian theory states that the number of a type of peas that fall into the classifications 
round and yellow, wrinkled and yellow, round and green, and wrinkled and green should be in the 
ratio 9:3:3:1. Suppose that 100 such peas revealed 56, 19, 17, and 8 in the respective categories. 
Are these data consistent with the model? Use о = .05. (The expression 9:3:3:1 means that 
9/16 of the peas should be round and yellow, 3/16 should be wrinkled and yellow, etc.) 


Refer to Exercise 14.6(f) and to the data in Exercise 14.8. 


a Give a 95% confidence interval for the difference in the proportions of round—yellow and 
round-green peas. 


b Construct, using the Bonferroni method discussed in Section 13.12, simultaneous confi- 
dence intervals to compare the proportion of round-yellow peas with the proportions of 
peas in each of the other three categories. The intervals are to have simultaneous confidence 
coefficient at least .95. 


Two types of defects, А and B, are frequently seen in the output of a manufacturing process. 
Each item can be classified into one of the four classes: AN B, AN B, AN B, and AN В, where 
А denotes the absence of the type A defect. For 100 inspected items, the following frequencies 
were observed: 


ANB: 48, ANB: 18, ANB:21, ANB: 13. 


Is there sufficient evidence to indicate that the four categories, in the order listed, do not occur 
in the ratio 5:2:2:1? (Use a = .05.) 


The data in the following table are the frequency counts for 400 observations on the number 
of bacterial colonies within the field of a microscope, using samples of milk film.” Is there 
sufficient evidence to claim that the data do not fit the Poisson distribution? (Use œ = .05.) 


2. Source: C. A. Bliss and R. A. Fisher, “Fitting the Negative Binomial Distribution to Biological Data,” 
Biometrics 9 (1953): 176—200. Biometrics Society. All rights reserved. 
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Number of Colonies Frequency of 


per Field Observation 

0 56 
1 104 
2 80 
3 62 
+ 42 
5 27 
6 9 
7 9 
8 5 
9 3 
10 2 
11 0 
19 1 
400 


Text not available due to copyright restrictions 


Contingency Tables 


A problem frequently encountered in the analysis of count data concerns assessment 
of the independence of two methods for classification of subjects. For example, we 
might classify a sample of people by gender and by opinion on a political issue 
in order to test the hypothesis that opinions on the issue are independent of gender. 
Analogously, we might classify patients suffering from a disease according to the type 
of medication and their rate of recovery in order to see if recovery rate depends on the 
type of medication. In each of these examples, we wish to investigate a dependency 
(or contingency) between two classification criteria. 

Suppose that we wish to classify defects found on furniture produced in a manufac- 
turing plant according to (1) the type of defect and (2) the production shift. A total of 


Text not available due to copyright restrictions 
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Table 14.3 A contingency table 
Type of Defect 


Shift A B С р Total 


1 15 (22.51) 21 (20.99) 45 (38.94) 13 (11.56) 94 
2 26 (22.99) 31 (21.44) 34 (39.77) 5 (11.81) 96 
3 33 (28.50) 17 (26.57) 49 (49.29) 20(14.63) 119 


Total 74 69 128 38 309 


n = 309 furniture defects was recorded and the defects were classified as one of four 
types, A, B, C, or D. At the same time each piece of furniture was identified according 
to the production shift during which it was manufactured. These counts are presented 
in Table 14.3, an example of a contingency table. (As you will subsequently see, the 
numbers in parentheses are the estimated expected cell frequencies.) Our objective 
is to test the null hypothesis that type of defect is independent of shift against the 
alternative that the two categorization schemes are dependent. That is, we wish to test 
Ho: column classification is independent of row classification. 

Let рд equal the unconditional probability that a defect is of type A. Similarly, 
define рв, Pc, and pp as the probabilities of observing the three other types of defects. 
Then these probabilities, which we will call the column probabilities of Table 14.3, 
satisfy the requirement 


DA + pg pc + pp = 1. 


In like manner, let p; for i — 1,2, or 3 equal the row probabilities that a defective 
item was produced on shift i, where 


Di pa + рз = 1. 


If the two classifications are independent of each other, each cell probability equals 
the product of its respective row and column probabilities. For example, the probabil- 
ity that a defect will occur on shift 1 and be of type A is pı x pa. We observe that the 
numerical values of the cell probabilities are unspecified in the problem under consid- 
eration. The null hypothesis specifies only that each cell probability equals the product 
of its respective row and column probabilities and thereby implies independence of 
the two classifications. 

The analysis of the data obtained from a contingency table differs from the analysis 
in Example 14.1 because we must estimate the row and column probabilities in order 
to estimate the expected cell frequencies. The estimated expected cell frequencies 
may be substituted for the E(n;) in X?, and X? will continue to possess a distribution 
that is well approximated by a x? probability distribution. 

The MLE for any row or column probability is found as follows. Let n;; denote 
the observed frequency in row i and column j of the contingency table and let р;; 
denote the probability of an observation falling into this cell. If observations are 
independently selected, then the cell frequencies have a multinomial distribution, and 
the MLE of pi; is simply the observed relative frequency for that cell. That is, 

A nij r : 
Ру = =, =з Туа = Wy Qi esl 


(see Exercise 9.87). 
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Likewise, viewing row i as a single cell, the probability for row i is given by p;, 
and if r; denotes the number of observations in row i, 
^ Fi 
р = — 
п 
is the MLE of р;. 
By analogous arguments, the MLE of the jth-column probability is c;/n, where 
с; denotes the number of observations in column j. 
Under the null hypothesis, the MLE of the expected value of л is 


y x A ГІ С] Гү Сү 
E(ny) = "(Py x pa) =n( )( ) = . 
п п п 


Analogously, if the null hypothesis is true, the estimated expected value of the cell 
frequency, n;; for a contingency table is equal to the product of its respective row and 
column totals divided by the total sample size. That is, 


EX lic; 
En) = ó 
n 


The estimated expected cell frequencies for our example are shown in parentheses in 
Table 14.3. For example, 


— qe 94(74) 
Fou) = = з 


We may now use the expected and observed cell frequencies shown in Table 14.3 
to calculate the value of the test statistic: 


xi- 5 y [nij — E(nj)f. 


= 22.51. 


= E(nj) 
| (5-2251y (26 — 22.99)? (20 — 14.63)? 
m 22.51 22.99 14.63 
= 19.17. 


The only remaining obstacle involves the determination of the appropriate number 
of degrees of freedom associated with the test statistic. We will give this as a rule, which 
we will subsequently justify. The degrees of freedom associated with a contingency 
table possessing r rows and c columns will always equal (r — 1)(c — 1). For our 
example, we will compare X ? with the critical value of x? with (г — 1)(c — 1) = 
(3 — 1)(4— 1) = 6 df. 

You will recall that the number of degrees of freedom associated with the x? 
statistic will equal the number of cells (in this case, k — r x c) less 1 df for each 
independent linear restriction placed on the cell probabilities. The total number of 
cells for the data of Table 14.3 is k — 12. From this we subtract 1 df because the sum 
of the cell probabilities must equal 1; that is, 


Put рї? ++ рэд = 1. 


In addition, we used the cell frequencies to estimate two of the three row probabilities. 
Notice that the estimate of the third-row probability is determined once we have 
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estimated p; and p», because 


р + р + рз = 1. 


Thus, we lose 3 — | = 2 df for estimating the row probabilities. 

Finally, we used the cell frequencies to estimate (c — 1) = 3 column probabilities, 
and therefore we lose (c — 1) = 3 additional degrees of freedom. The total number 
of degrees of freedom remaining is 


а= 12-1-2-3=6= G- D(4- 1). 


In general, we see that the total number of degrees of freedom associated with ап 
r x c contingency table is 


df =rc—1—(r—1)-(c 


1) = (r — 1)(с— 1). 


Therefore, in our example relating shift to type of furniture defect, if we use 
a = .05, we will reject the null hypothesis that the two classifications are independent 
if X? > 12.592. Because the value of the test statistic, X? = 19.17, exceeds the critical 
value of x7, we reject the null hypothesis at the о = .05 level of significance. The 
associated p-value is given by p-value = P(x” > 19.17). Bounds on this probability 
can be obtained using Table 6, Appendix 3, from which it follows that p-value < .005. 
The applet Chi-Square Probability and Quantiles give the exact p-value = .00389. 
Thus, for any value of o greater than or equal to .00389, the data present sufficient 
evidence to indicate dependence between defect type and manufacturing shift. A study 
of the production operations for the three shifts would probably reveal the cause. 


EXAMPLE 14.3 


Solution 


A survey was conducted to evaluate the effectiveness of a new flu vaccine that had 
been administered in a small community. The vaccine was provided free of charge in 
a two-shot sequence over a period of 2 weeks to those wishing to avail themselves 
of it. Some people received the two-shot sequence, some appeared only for the first 
shot, and the others received neither. 

A survey of 1000 local inhabitants in the following spring provided the information 
shown in Table 14.4. Do the data present sufficient evidence to indicate a dependence 
between the two classifications—vaccine category and occurrence or nonoccurrence 
of flu? 


The question asks whether the data provide sufficient evidence to indicate a depen- 
dence between vaccine category and occurrence or nonoccurrence of flu. We therefore 


analyze the data as a contingency table. 


Table 14.4 Data tabulation for Example 14.3 


Status No Vaccine One Shot Two Shots Total 
Flu 24 (14.4) 9 (5.0) 13 (26.6) 46 
No flu 289 (298.6) 100 (104.0) 565 (551.4) 954 
Total 313 109 578 1000 
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The estimated expected cell frequencies may be calculated by using the appropriate 
row and column totals, 


—— lic; 
E(nij) = 5 
n 


Thus, for example, 
гүсү _ (46)(313) 


Emi) = oo = 44 
— re (46109) 
E(nn)= c TU E = 5.0. 


These and the remaining estimated expected cell frequencies are shown in parentheses 
in Table 14.4. 

The value of the test statistic X? will now be computed and compared with the 
critical value of x? possessing (r — 1)(c — 1) = (1)(2) = 2 df. Then for a = .05, we 
will reject the null hypothesis when X? > 5.991. Substituting into the formula for 
X?, we obtain 


, _ (24— 14.4)2 (289— 298.6) (565 — 551.4)? 
i 14.4 298.6 551.4 
— 17.35. 


Observing that X? falls in the rejection region, we reject the null hypothesis of 
independence of the two classifications. If we choose to use the attained significance- 
level approach to making our inference, use of Table 6, Appendix 3, establishes that 
p-value < .005. The x? applet gives p-value = .00017. As is always the case, we find 
agreement between our fixed o-level approach to testing and the proper interpretation 
of the p-value. El 


14.13 


As established in Section 5.9, the n;;'s are negatively corellated. For example, 
Cov(nij, ny) = —npij pa if i # К or j ¥ l. An adaptation of the result given 
in Exercise 14.7(f) can be used to provide a large sample confidence interval for 
Pij — ры if such an interval has practical interpretive value. Similarly, the marginal 
proportions can be compared by “collapsing” the contingency table to only the row or 
column marginal observations. The result in Exercise 14.7(f) directly applies to the 
collapsed table. However, these “collapsed” marginal tables sacrifice any information 
about the dependence between the row and column variables. 

We have considered only the simplest hypothesis connected with a contingency 
table, that of independence between rows and columns. Many other hypotheses are 
possible, and numerous techniques have been devised to test these hypotheses. For 
further information on this topic, consult Agresti (2002) and Fienberg (1980). 


Exercises 


On the 40th anniversary of President John F. Kennedy's assassination, a FOX news poll showed 
that most Americans disagree with the government's conclusions about the killing. The Warren 
Commission found that Lee Harvey Oswald acted alone when he shot Kennedy, but many 
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Americans are not so sure about this conclusion. Do you think that we know all of the relevant 
facts associated with Kennedy’s assassination, or do you think that some information has been 
withheld? The following table contains the results of a nationwide poll of 900 registered voters.* 


We Know АП Some Relevant 
Relevant Facts Facts Withheld Not Sure 


Democrat 42 309 31 
Republican 64 246 46 
Other 20 115 27 


a Do the data provide sufficient evidence to indicate a dependence between party affiliation 
and opinion about a possible cover-up? Test using a = .05. 


b Give bounds for the associated p-value and interpret the result. 
c Applet Exercise Use the x? applet to obtain the approximate p-value. 
d Why is the value you obtained in part (c) "approximate"? 


14.14 A study was conducted by Joseph Jacobson and Diane Wille to determine the effect of early 


child care on infant-mother attachment patterns.? In the study, 93 infants were classified as 
either “secure” or "anxious" using the Ainsworth strange-situation paradigm. In addition, the 
infants were classified according to the average number of hours per week that they spent in 
child care. The data appear in the accompanying table. 


Hours in Child Care 
Attachment Low Moderate High 
Pattern (0-3 hours) (4—19 hours) (20—54 hours) 
Secure 24 35 5 
Anxious 11 10 8 


a Do the data indicate a dependence between attachment patterns and the number of hours 
spent in child care? Test using o = .05. 


b Give bounds for the attained significance level. 


14.15 Suppose that the entries in a contingency table that appear in row i and column j are denoted 


nij, fori = 1,2,...,rand j = 1,2,...,c; that the row and column totals are denoted r;, for 
i = 1,2,...,r, and cj, for j = 1,2,...,c; and that the total sample size is n. 
a Show that 
с d Hi En; 2 c r п 
ay 800 yd). 
за ial E(nij) j=l iat TEC) 
Notice that this formula provides a computationally more efficient way to compute the 


value of X?. 


b Using the preceding formula, what happens to the value of X? if every entry in the contin- 
gency table is multiplied by the same integer constant k > 0? 


14.16 А survey to explore the relationship between voters’ church-attendance patterns and their 


choice of presidential candidate was reported in the Riverside Press-Enterprise prior to the 


4. Source: Adapted from Dana Blanton, “Poll: Most Believe ‘Cover-Up’ of JFK Assassination Facts,” 
http://www.foxnews.com/story/0,2933,102511,00.html, 10 February 2004. 

5. Source: Linda Schmittroth (ed.), Statistical Record of Women Worldwide (Detroit and London: Gale 
Research, 1991), pp. 8, 9, 335. 
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2004 presidential election. Voters were asked how often they attended church services and 
which of the two major presidential candidates (George W. Bush or John Kerry) they intended 
to vote for in the election. The results of a similar survey are contained in the following table. 


Church Attendance Bush Kerry 
More than once per week 89 53 
Once per week 87 68 
Once or twice per month 93 85 
Once or twice per year 114 134 
Seldom/never 22 36 


a Is there sufficient evidence to indicate dependence between reported frequency of church 
attendance and choice of presidential candidate in the 2004 presidential election? Test at 
the .05 level of significance. Place bounds on the attained significance level. 


b Give a 95% confidence interval for the proportion of individuals who report attending 


church at least once per week. 


In the academic world, students and their faculty supervisors often collaborate on research 
papers, producing works in which publication credit can take several forms. Many feel that the 
first authorship of a student’s paper should be given to the student unless the input from the 
faculty advisor was substantial. In an attempt to see whether this is in fact the case, authorship 
credit was studied for several different levels of faculty input and two objectives (dissertations 
versus nondegree research). The frequency of authorship assignment decisions for published 
dissertations is given in the accompanying tables as assigned by 60 faculty members and 161 


students: 
Faculty respondents 
Authorship Assignment High Input Medium Input Low Input 
Faculty first author, student mandatory 4 0 0 
second author 
Student first author, faculty mandatory 15 12 3 
second author 
Student first author, faculty courtesy 2 7 7 
second author 
Student sole author 2 3 5 
Student respondents 
Authorship Assignment High Input Medium Input Low Input 
Faculty first author, student mandatory 19 6 2 
second author 
Student first author, faculty mandatory 19 41 27 
second author 
Student first author, faculty courtesy 3 T 31 
second author 
Student sole author 0 3 3 


6. Source: Adapted from Bettye Wells Miller, “Faith Shows Ballot Clout,’ Press-Enterprise (Riverside, 


Calif.), 1 March 2004, p. A7. 


7. Source: M. Martin Costa and М. Gatz, "Determination of Authorship Credit in Published Dissertations,” 


Psychological Science 3(6) (1992): 54. 
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a Is there sufficient evidence to indicate a dependence between the authorship assignment 
and the input of the faculty advisor as judged by faculty members? Test using a = .01. 

b Is there sufficient evidence to indicate a dependence between the authorship assignment 
and the input of the faculty advisor as judged by students? Test using o = .01. 

c Have any of the assumptions necessary for a valid analysis in parts (a) and (b) been violated? 
What effect might this have on the validity of your conclusions? 


A study of the amount of violence viewed on television as it relates to the age of the viewer 
yielded the results shown in the accompanying table for 81 people. (Each person in the study 
was classified, according to the person’s TV-viewing habits, as a low-violence or high-violence 
viewer.) Do the data indicate that viewing of violence is not independent of age of viewer, at 
the 5% significance level? 


Age 
Viewing 16-34 35-54 55 and Over 
Low violence 8 12 21 
High violence 18 15 7 


The results of a study? suggest that the initial electrocardiogram (ECG) of a suspected heart 
attack victim can be used to predict in-hospital complications of an acute nature. The study 
included 469 patients with suspected myocardial infarction (heart attack). Each patient was 
categorized according to whether their initial ECG was positive or negative and whether the 
person suffered life-threatening complications subsequently in the hospital. The results are 
summarized in the following table. 


Subsequent In-Hospital Life-Threatening 


Complications 
ECG No Yes Total 
Negative 166 1 167 
Positive 260 42 302 
Total 426 43 469 


a Is there sufficient evidence to indicate that whether or not a heart attack patient suffers 
complications depends on the outcome of the initial ECG? Test using a = .05. 


b Give bounds for the observed significance level. 


Refer to Exercise 14.10. Test the hypothesis, at the 5% significance level, that the type A defects 
occur independently of the type B defects. 


An interesting and practical use of the x? test comes about in testing for segregation of species 
of plants or animals. Suppose that two species of plants, A and B, are growing on a test plot. To 
assess whether the species tend to segregate, a researcher randomly samples n plants from the 
plot; the species of each sampled plant, and the species of its nearest neighbor are recorded. 
The data are then arranged in a table, as shown here. 


8. Source: J. E. Brush et al., “Use of the Initial Electrocardiogram to Predict In-Hospital Complications 
of Acute Myocardial Infarction,’ New England Journal of Medicine (May 1985). 
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Nearest Neighbor 
Sampled Plant A B 
A a b 
B С а 

п 


If a and а are large relative to b апа с, we would be inclined to say that the species tend to 
segregate. (Most of A’s neighbors are of type A, and most of B’s neighbors are of type B.) If 
b and с are large compared to a and d, we would say that the species tend to be overly mixed. 
In either of these cases (segregation or overmixing), a x? test should yield a large value, and 
the hypothesis of random mixing would be rejected. For each of the following cases, test the 
hypothesis of random mixing (or, equivalently, the hypothesis that the species of a sample plant 
is independent of the species of its nearest neighbor). Use о = .05 in each case. 


а a=20,b=4,c=8,d=18. 
b a=4,b=20,c=18,d=8. 
a=20,b=4,c=18,d=8. 


rx c Tables with Fixed Row 
or Column Totals 


In the previous section, we described the analysis of an r x c contingency table by 
using examples that for all practical purposes fit the multinomial experiment described 
in Section 14.1. Although the methods of collecting data in many surveys may meet 
the requirements of a multinomial experiment, other methods do not. For example, we 
might not wish to randomly sample the population described in Example 14.3 because 
we might find that due to chance one category is completely missing. People who have 
received no flu shots might fail to appear in the sample. We might decide beforehand 
to interview a specified number of people in each column category, thereby fixing 
the column totals in advance. We would then have three separate and independent 
binomial experiments, corresponding to “no vaccine,” “one shot," and “two shots,” 
with respective probabilities pı, p», and рз that a person contracts the flu. In this case, 
we are interested in testing the null hypothesis 


Ho: ру = рә = рз. 
(We actually are testing the equivalence of three binomial distributions.) Under this 
hypothesis, the MLEs of the expected cell frequencies are the same as in Section 14.4, 
namely, 
— С) 
Ет) = —. 
How many degrees of freedom are associated with the approximating x? distribution? 


There are rc probabilities overall. Since the column totals are fixed, the sum of the 
probabilities in each column must equal one. That is, 


Pij + proj +--+ Pri = 1, for each j = 1, 2,...с, 
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and there are c linear constraints on the р; ;'ѕ, resulting in a loss of c df. Finally, it is 
necessary to estimate r — 1 row probabilities (the estimated row probabilities must 
add to 1), decreasing the degrees of freedom by an additional r — 1. Thus, the number 
of degrees of freedom associated with X? computed for an r x c table with fixed 
column totals is df — rc — c — (r — 1) = (r — 1)(c — 1). 

To illustrate, suppose that we wish to test a hypothesis concerning the equivalence 
of four binomial populations, as indicated in the following example. 


EXAMPLE 14.4 


Solution 


A survey of voter sentiment was conducted in four midcity political wards to compare 
the fraction of voters favoring candidate A. Random samples of 200 voters were polled 
in each of the four wards, with results as shown in Table 14.5. Do the data present 
sufficient evidence to indicate that the fractions of voters favoring candidate A differ 
in the four wards? 


You will observe that the mechanics for testing hypotheses concerning the equivalence 
of the parameters of the four binomial populations that correspond to the four wards 
is identical to the mechanics associated with testing the hypothesis of independence 
of the row and column classifications. If we denote the fraction of voters favoring A 
as p and hypothesize that p is the same for all four wards, we imply that the first-row 
probabilities are all equal to p and and that the second-row probabilities are all equal 
to 1 — p. The MLE (combining the results from all four samples) for the common 
value of p is p = 236/800 = н/п. The expected number of individuals who favor 
candidate A in ward 1 is Е(п\у) = 200p, which is estimated by the value 


236 u (ciri) 
800) . | 


Notice that even though we are considering a very different experiment than that 
considered in Section 14.4, the estimated mean cell frequencies are computed the 
same way as they were in Section 14.4. The other estimated expected cell frequencies, 
calculated by using the row and column totals, appear in parentheses in Table 14.5. 
We see that 


n 


—— 


CSS s - Esp]. 
RH n E(nij) 
(76 – 59)2 (124—141)? (152 — 141)? 
= 59 + i41 +--+ а = 10.72. 
Table 14.5 Data tabulation for Example 14.4 
Ward 
Opinion 1 2 3 4 Total 
Favor A 76 (59) 53 (59) 59 (59) 48 (59) 236 


Do not favor A  124(141) 147 (141) 141 (141) 152(141) 564 


Total 200 200 200 200 800 
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The critical value of x? for a = .05 and (r — 1)(c — 1) = (1)(3) = 3 df is 7.815. 
Because X? exceeds this critical value, we reject the null hypothesis and conclude 
that the fraction of voters favoring candidate A is not the same for all four wards. The 
associated p-value is given by P (x? > 10.72) when x? has 3 df. Thus, .01 < p-value 
< .025. The x? applet gives P(x? > 10.72) = .01334. a 


14.22 


14.23 


14.24 


This example was worked out in Exercise 10.106 by the likelihood ratio method. 
Notice that the conclusions are the same. 

The test implemented in Example 14.4 is a test of the equality of four binomial pro- 
portions based on independent samples from each of the corresponding populations. 
Such a test is often referred to as a test of homogeneity of the binomial populations. 
If there are more than two row categories and the column totals are fixed, the x? test 
is a test of the equivalence of the proportions in c multinomial populations. 


Exercises 


A study to determine the effectiveness of a drug (serum) for the treatment of arthritis resulted 
in the comparison of two groups each consisting of 200 arthritic patients. One group was 
inoculated with the serum whereas the other received a placebo (an inoculation that appears to 
contain serum but actually is not active). After a period of time, each person in the study was 
asked whether his or her arthritic condition had improved. The results in the accompanying 
table were observed. Do these data present sufficient evidence to indicate that the proportions 
of arthritic individuals who said their condition had improved differed depending on whether 
they received the serum? 


Condition Treated Untreated 
Improved 117 74 
Not improved 83 126 


a Test by using the X? statistic. Use о = .05. 

b Test by using the Z test of Section 10.3 and a = .05. Compare your result with that in 
part (a). 

с Give bounds for the attained significance level associated with the test in part (a). 


The x? test used in Exercise 14.22 is equivalentto the two-tailed Z test of Section 10.3, provided 
a is the same for the two tests. Show algebraically that the x? test statistic X? is the square of 
the test statistic Z for the equivalent test. 


How do Americans in the "sandwich generation" balance the demands of caring for older and 
younger relatives? The following table contains the results of a telephone poll of Americans 
aged 45 to 55 years conducted by the New York Times.? From each of four subpopulations, 
200 individuals were polled and asked whether they were providing financial support for their 
parents. 


9. Source: Adapted from Tamar Lewin, “Report Looks at a Generation, and Caring for Young and Old,” 
New York Times online, 11 July 2001. 
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Subpopulation 
White African Hispanic Asian 
Support Americans Americans Americans Americans 
Yes 40 56 68 84 
No 160 144 132 116 


a Use the x? test to determine whether the proportions of individuals providing financial 
support for their parents differ for the four subpopulations. Use œ = .05. 

b Since the samples are independent, confidence intervals to compare the proportions in 
each subpopulation who financially support their parents can be obtained using the method 
presented in Section 8.6. 


i Give a 95% confidence interval for the difference in proportions who provide parental 
support for White and Asian Americans. 

ii Use the Bonferroni method presented in Section 13.12 to give six simultaneous confi- 
dence intervals to compare the proportions who provide parental support for all pairs 
of subpopulations. The objective is to provide intervals with simultaneous confidence 
coefficient at least .95. 

iii Based on your answer to part (ii), which subpopulations differ from the others regarding 
the proportion who provide financial support for their parents? 


14.25 Does education really make a difference in how much money you will earn? Reseachers ran- 
domly selected 100 people from each of three income categories—‘marginally rich,” “com- 
fortably rich,” and "super rich” —and recorded their education levels. The data is summarized 
in the table that follows.!° 


Highest Marginally Comfortably 

Education Level Rich Rich Super Rich 
No college 32 20 23 
Some college 13 16 1 
Undergraduate degree 43 51 60 
Postgraduate study 12 13 16 
Total 100 100 100 


a Describe the independent multinomial populations whose proportions are compared in the 
x? analysis. 

b Do the data indicate that the proportions in the various education levels differ for the three 
income categories? Test at the о = .01 level. 

c Construct a 95% confidence interval for the difference in proportions with at least an un- 
dergraduate degree for individuals who are marginally and super rich. Interpret the interval. 


14.26 A manufacturer of buttons wished to determine whether the fraction of defective buttons 
produced by three machines varied from machine to machine. Samples of 400 buttons were 
selected from each of the three machines, and the number of defectives were counted for each 
sample. The results are shown in the accompanying table. Do these data present sufficient 
evidence to indicate that the fraction of defective buttons varied from machine to machine? 


10. Source: Adapted from Rebecca Piirto Heath, “Life on Easy Street,’ American Demographics, April 
1997, p. 33. 
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Machine Number of 
Number Реѓесіуеѕ 


1 16 
2 24 
3 9 


a Test, using о = .05, witha х? test. 
*b Test, using о = .05, with a likelihood ratio test. [Hint: Refer to Exercise 10.106.]!! 


Text not available due to copyright restrictions 


Traditionally, U.S. labor unions have been content to leave the management of companies to 
managers and corporate executives. In Europe, worker participation in management decision 
making is an accepted idea that is becoming increasingly popular. To study the effect of worker 
participation, 100 workers were interviewed in each of two separate German manufacturing 
plants. One plant had active worker participation in managerial decision making; the other 
plant did not. Each selected worker was asked whether he or she approved of the managerial 
decisions made within the plant. The results follow. 


Participation No Participation 


Generally approve 73 51 
Do not appove 21 49 


a Do the data indicate a difference in the proportions of workers in the two plants who 
generally approve of managerial decisions? Test at the .05 significance level using the 
2 
X^ test. 


b Construct a 95% lower confidence bound for the difference in the proportion of workers 
who approve of managerial decisions in the plants with and without worker participation. 


11. Exercises preceded by an asterisk are optional. 


Text not available due to copyright restrictions 
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Does the resulting confidence bound indicate that a greater proportion of workers approve 
of managerial decisions in the plant with active worker participation? Why? 


с Could the conclusion that you reached in part (b) have resulted from the x? test implemented 
in part (a)? Why? 


A survey was conducted to study the relationship between lung disease and air pollution. Four 
areas were chosen for the survey, two cities frequently plagued with smog and two nonurban 
areas in states that possessed low air-pollution counts. Only adult permanent residents of the area 
were included in the study. Random samples of 400 adult permanent residents from each area 
gave the results listed in the accompanying table. 


Number with 


Area Lung Disease 
City A 34 
City B 42 
Nonurban area 1 21 
Nonurban area 2 18 


a Do the data provide sufficient evidence to indicate a difference in the proportions with lung 
disease for the four locations? 


b Should cigarette smokers have been excluded from the samples? How would this affect 
inferences drawn from the data? 


Refer to Exercise 14.29. Estimate the difference in the fractions of adult permanent residents 
with lung disease for cities A and B. Use a 95% confidence interval. 


A survey was conducted to investigate interest of middle-aged adults in physical-fitness pro- 
grams in Rhode Island, Colorado, California, and Florida. The objective of the investigation 
was to determine whether adult participation in physical-fitness programs varies from one 
region of the United States to another. Random samples of people were interviewed in each 
state, and the data reproduced in the accompanying table were recorded. Do the data indicate 
differences among the rates of adult participation in physical-fitness programs from one state 
to another? What would you conclude with o = .01? 


Participation Rhode Island Colorado California Florida 


Yes 46 63 108 121 
No 149 178 192 179 


Other Applications 


The applications of the x? test in analyzing categorical data described in Sections 
14.3-14.5 represent only a few of the interesting classification problems that may be 
approximated by the multinomial experiment and for which our method of analysis is 
appropriate. Generally, these applications are complicated to a greater or lesser degree 
because the numerical values of the cell probabilities are unspecified and hence require 
the estimation of one or more population parameters. Then, as in Sections 14.4 and 
14.5, we can estimate the cell probabilities. Although we omit the mechanics of the 
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statistical tests, several additional applications of the x? test are worth mention as a 
matter of interest. 

For example, suppose that we wish to test a hypothesis stating that a population 
possesses anormal probability distribution. The cells of a sample frequency histogram 
would correspond to the К cells of the multinomial experiment, and the observed cell 
frequencies would be the number of measurements falling into each cell of the his- 
togram. Given the hypothesized normal probability distribution for the population, 
we could use the areas under the normal curve to calculate the theoretical cell proba- 
bilities and hence the expected cell frequencies. MLEs must be employed when u and 
o are unspecified for the normal population, and these parameters must be estimated 
to obtain the estimated cell probabilities. 

The construction of a two-way table to investigate dependency between two clas- 
sifications can be extended to three or more classifications. For example, if we wish 
to test the mutual independence of three classifications, we would employ a three- 
dimensional “table.” The reasoning and methodology associated with the analysis of 
both the two- and three-way tables are identical although the analysis of the three-way 
table is a bit more complex. 

A third and interesting application of our methodology would be its use in the 
investigation of the rate of change of a multinomial (or binomial) population as 
a function of time. For example, we might study the problem-solving ability of a 
human (or any animal) subjected to an educational program and tested over time. If, 
for instance, the human is tested at prescribed intervals of time and the test is of the 
yes or no type, yielding a number of correct answers y that would follow a binomial 
probability distribution, we would be interested in the behavior of the probability of 
a correct response p as a function of time. If the number of correct responses was 
recorded for c time periods, the data would fall in a 2 x c table similar to that in 
Example 14.4 (Section 14.5). We would then be interested in testing the hypothesis 
that p is equal to a constant—that is, that no learning has occurred—and we would 
then proceed to more interesting hypotheses to determine whether the data present 
sufficient evidence to indicate a gradual (say, linear) change over time as opposed to 
an abrupt change at some point in time. The procedures that we have described could 
be extended to decisions involving more than two alternatives. 

You will observe that our change over time example is common to business, to 
industry, and to many other fields, including the social sciences. For example, we 
might wish to study the rate of consumer acceptance of a new product for various 
types of advertising campaigns as a function of the length of time that the campaign 
has been in effect. Or we might wish to study the trend in the lot-fraction defective in 
a manufacturing process as a function of time. Both these examples, as well as many 
others, require a study of the behavior of a binomial (or multinomial) process as a 
function of time. 

The examples just described are intended to suggest the relatively broad application 
of the x? analysis of categorical data, a fact that should be borne in mind by the 
experimenter concerned with this type of data. The statistical test employing X? as 
a test statistic is often called a goodness-of-fit test. Its application for some of these 
examples requires care in the determination of the appropriate estimates and the 
number of degrees of freedom for Х?, which for some of these problems may be 
rather complex. 
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Summary and Concluding Remarks 


The material in this chapter has been concerned with tests of hypotheses regarding the 
cell probabilities associated with multinomial experiments (Sections 14.2 and 14.3) 
or several independent multinomial experiments (Section 14.5). When the number of 
observations п is large, the test statistic Х? can be shown to possess, approximately, 
a x? probability distribution in repeated sampling, the number of degrees of freedom 
depending on the particular application. In general, we assume that п is large and that 
the minimum expected cell frequency is equal to or greater than five. 

Several words of caution concerning the use of the X? statistic as a method of an- 
alyzing categorical data are appropriate. The determination of the correct number of 
degrees of random associated with the X? statistic is critical in locating the rejection 
region. If the number is specified incorrectly, erroneous conclusions might result. 
Notice, too, that nonrejection of the null hypothesis does not imply that it should 
be accepted. We would have difficulty in stating a meaningful alternative hypothe- 
sis for many practical applications, and therefore we would lack knowledge of the 
probability of making a type II error. For example, we hypothesize that the two 
classifications of a contingency table are independent. A specific alternative must 
specify a measure of dependence that may or may not possess practical significance 
to the experimenter. Finally, if parameters are missing and the expected cell fre- 
quencies must be estimated, missing parameters should be estimated by the method 
of maximum likelihood in order that the test be valid. In other words, the applica- 
tion of the x? test for other than the simple applications outlined in Sections 14.3— 
14.5 will require experience beyond the scope of this introductory presentation of 
the subject. 
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Supplementary Exercises 


List the characteristics of a multinomial experiment. 


A survey was conducted to determine student, faculty, and administration attitudes on a new 
university parking policy. The distribution of those favoring or opposing the policy was as shown 
in the accompanying table. Do the data provide sufficient evidence to indicate that attitudes 
regarding the parking policy are independent of student, faculty, or administration status? 


Opinion Student Faculty Administration 


Favor 252 107 43 
Oppose 139 81 40 


How would you rate yourself as a driver? According to a survey conducted by the Field 
Institute,!* most Californians think that they are good drivers but have little respect for the 
driving ability of others. The data in the following tables show the distribution of opinions, 
according to gender, for two different questions. Data in the first table give the results obtained 
when drivers rated themselves; the second table gives the results obtained when drivers rated 
others. Although not stated in the source, we assume that there were 100 men and 100 women 
in each of the surveyed groups. 


Rating self as driver 


Gender Excellent Good Fair 


Male 43 48 9 
Female 44 53 3 


Rating others as drivers 


Gender Excellent Good Fair Poor 


Male 4 42 41 13 
Female 3 48 35 14 


a Refer to the table in which drivers rated themselves. Is there sufficient evidence to indicate 
that there is a difference in the proportions in the three ratings categories for male and 
female drivers? Give bounds for the p-value associated with the test. 

b Refer to the table in which drivers rated others. Is there sufficient evidence to indicate that 
there is a difference in the proportions in the four ratings categories when rating male and 
female drivers? Give bounds for the p-value associated with the test. 

c Have you violated any assumptions in your analyses in parts (a) and (b)? What effect might 
these violations have on the validity of your conclusions? 


13. Source: Dan Smith, “Motorists Have Little Respect for Others’ Skills,’ Press-Enterprise (Riverside, 
Calif.), 15 March 1991. 
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Is the chance of getting a cold influenced by the number of social contacts a person has? A 
study by Sheldon Cohen, a psychology professor at Carnegie Melon University, seems to show 
that the more social relationships a person has, the /ess susceptible the person is to colds. A 
group of 276 healthy men and women were grouped according to their number of relationships 
(such as parent, friend, church member, and neighbor). They were then exposed to a virus that 
causes colds. A adaptation of the results is given in the following table.!^ 


Number of Relationships 


3orfewer 4or5  60rmore 


Cold 49 43 34 
No cold 31 57 62 
Total 80 100 96 


a Do the data present sufficient evidence to indicate that susceptibility to colds is affected by 
the number of relationships that people have? Test at the 5% level of significance. 


b Give bounds for the p-value. 


Knee injuries are a major problem for athletes in many contact sports. However, athletes who 
play certain positions are more prone to knee injuries than other players. The prevalence and 
patterns of knee injuries among female collegiate rugby players were investigated using a 
simple questionnaire, to which 42 rugby clubs responded.'> A total of 76 knee injuries were 
classified by type and the position (forward or back) played by the injured player. 


Meniscal MCL ACL 


Position Tear Tear Tear Other 
Forward 13 14 7 4 
Back 12 9 14 3 


a Do the data provide sufficient evidence to indicate dependence between position played 
and type of knee injury? Test using о = .05. 


Give bounds for the p-value associated with the value for X? obtained in part (a). 


Applet Exercise Use the applet Chi-Square Probability and Quantiles to determine the 
p-value associated with the value of X? obtained in part (a). 


It is often not clear whether all properties of a binomial experiment are actually met in a given 
application. A goodness-of-fit test is desirable for such cases. Suppose that an experiment 
consisting of four trials was repeated 100 times. The number of repetitions on which a given 
number of successes was obtained is recorded in the accompanying table. Estimate p (assuming 
that the experiment was binomial), obtain estimates of the expected cell frequencies, and test 


14. Source: Adapted from David L. Wheeler, “Моге Social Roles Means Fewer Colds,” Chronicle of 
Higher Education 43(44) (1997): A13. 


15. Source: Andrew S. Levy, M. J. Wetzler, M. Lewars, and W. Laughlin, “Knee Injuries in Women 
Collegiate Rugby Players,” American Journal of Sports Medicine 25(3) (1997): 360. 
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for goodness of fit. To determine the appropriate number of degrees of freedom for Х?, notice 
that p had to be estimated. 


Possible Results Number of Times 
(number of successes) Obtained 
0 11 
1 17 
2 42 
3 21 
4 9 


Counts on the number of items per cluster (or colony or group) must necessarily be greater than 
or equal to 1. Thus, the Poisson distribution generally does not fit these kinds of counts. For 
modeling counts on phenomena such as number of bacteria per colony, number of people per 
household, and number of animals per litter, the logarithmic series distribution often proves 
useful. This discrete distribution has probability function given by 

1 Ө? 


буы SON ©. 21,2,3,...,0 «0 <1, 
р(у10) 020) y y IS 


where 0 is an unknown parameter. 


a Show that the MLE Ó of 0 satisfies the equation 


_ 6 = 1 п 
YS == where Ү=-}у x. 
—(1 — 80)In(1 — 0) ne 


Text not available due to copyright restrictions 


Refer to the r x c contingency table of Section 14.4. Show that the MLE of the probability p; 
for row i is p; = ri/n, fori = 1,2,...,r. 


A genetic model states that the proportions of offspring in three classes should be p?, 
2p(1 — p), and (1 — p)? for a parameter p, 0 < p < 1. An experiment yielded frequen- 
cies of 30, 40, and 30 for the respective classes. 


a Does the model fit the data? (Use maximum likelihood to estimate p.) 


b Suppose that the hypothesis states that the model holds with p = .5. Do the data contradict 
this hypothesis? 


According to the genetic model for the relationship between sex and color blindness, the four 
categories, male and normal, female and normal, male and color blind, female and color blind, 


Text not available due to copyright restrictions 
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should have probabilities given by p/2, (p?/2) + pq, q/2, and q?/2, respectively, where 
а = 1— р. A sample of 2000 people revealed 880, 1032, 80, and 8 in the respective categories. 
Do these data agree with the model? Use a = .05. (Use maximum likelihood to estimate p.) 


Suppose that (Y;, Y2, ..., Y) has a multinomial distribution with parameters п, pi. po. .... 
pi, and (X1, X2,..., X4) has a multinomial distribution with parameters m, рї, p3, ..., р. 
Construct a test of the null hypothesis that the two multinomial distributions are identical; that 
is, test Но:ру = рї, P2 = pi... Pk = pi- 


In an experiment to evaluate an insecticide, the probability of insect survival was expected to 
be linearly related to the dosage D over the region of experimentation; thatis, p — 1 4- 8D. An 
experiment was conducted using four levels of dosage, 1, 2, 3, and 4 and 1000 insects in each 
group. The resulting data were as shown in the following table. Do these data contradict the 
hypothesis that p = 1 + 8D? [Hint: Write the cell probabilities in terms of 6 and find the 
MLE of £.] 


Dosage Number of Survivors 


1 820 
2 650 
3 310 
4 50 
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Introduction 


Some experiments yield response measurements that defy exact quantification. For ex- 
ample, suppose that a judge is employed to evaluate and rank the instructional abilities 
of four teachers or the edibility and taste characteristics of five brands of cornflakes. 
Because it clearly is impossible to give an exact measure of teacher competence or 
food taste, the response measurements are of a completely different character than 
those presented in preceding chapters. In instances like these, the experiments gener- 
ate response measurements that can be ordered (ranked), but it is impossible to make 
statements such as "teacher A is twice as good as teacher B." Although experiments 
of this type occur in almost all fields of study, they are particularly evident in social 
science research and in studies of consumer preference. Nonparametric statistical 
methods are useful for analyzing this type of data. 
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Nonparametric Statistics 


Nonparametric statistical procedures apply not only to observations that are dif- 
ficult to quantify but also are particularly useful in making inferences in situations 
where serious doubt exists about the assumptions that underlie standard methodology. 
For example, the t test for comparing a pair of means based on independent samples, 
Section 10.8, is based on the assumption that both populations are normally distributed 
with equal variances. The experimenter will never know whether these assumptions 
hold in a practical situation but often will be reasonably certain that departures from 
the assumptions will be small enough that the properties of the statistical procedure 
will be undisturbed. That is, œ and В will be approximately what the experimenter 
thinks they are. On the other hand, it is not uncommon for the experimenter to have 
serious questions about assumption validity and wonder whether he or she is using 
a valid statistical procedure. Sometimes this difficulty can be circumvented by using 
a nonparametric statistical test and thereby avoid using a statistical procedure that is 
only appropriate under a very uncertain set of assumptions. 

The term nonparametric statistics has no standard definition that is agreed on by 
all statisticians. However, most would agree that nonparametric statistical methods 
work well under fairly general assumptions about the nature of any probability dis- 
tributions or parameters that are involved in an inferential problem. As a working 
definition, we will define parametric methods as those that apply to problems where 
the distribution(s) from which the sample(s) is (are) taken is (are) specified except for 
the values of a finite number of parameters. Nonparametric methods apply in all other 
instances. For example, the one-sample f test developed in Chapter 10 applies when 
the population is normally distributed with unknown mean and variance. Because the 
distribution from which the sample is taken is specified except for the values of two 
parameters, ш and o?, the ¢ test is a parametric procedure. Alternatively, suppose 
that independent samples are taken from two populations and we wish to test the 
hypothesis that the two population distributions are identical but of unspecified form. 
In this case, the distribution is unspecified, and the hypothesis must be tested by using 
nonparametric methods. 

Valid employment of some of the parametric methods presented in preceding chap- 
ters requires that certain distributional assumptions are at least approximately met. 
Even if all assumptions are met, research has shown that nonparametric statistical 
tests are almost as capable of detecting differences among populations as the appli- 
cable parametric methods. They may be, and often are, more powerful in detecting 
population differences when the assumptions are not satisfied. For this reason many 
statisticians advocate the use of nonparametric statistical procedures in preference to 
their parametric counterparts. 


A General Two-Sample Shift Model 


Many times, an experimenter takes observations from two populations with the ob- 
jective of testing whether the populations have the same distribution. For example, 
if independent random samples X1, X2,..., Xn, and Yi, Yo, ..., Yn, are taken from 
normal populations with equal variances and respective means их and иу, the 
experimenter may wish to test Но: их — uy = 0 versus Hy: их — uy < 0. In 


FIGURE 15.1 
Two normal 
distributions with 
equal variances but 
unequal means 


FIGURE 15.2 
Two density 
functions, with the 
density for Y shifted 
Ө units to the right 
of that for X 
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this case, if Ho is true, both populations are normally distributed with the same mean 
and the same variance; that is, the population distributions are identical. If Н, is 
true, then uy > jx and the distributions of X, and Y; are the same, except that 
the location parameter (иу) for Y, is larger than the location parameter (uy) for 
Ху. Hence, the distribution of Y; is shifted to the right of the distribution of X, 
(see Figure 15.1). 

This is an example of a two-sample parametric shift (or location) model. The model 
is parametric because the distributions are specified (normal) except for the values 
of the parameters x, py, and c?. The amount that the distribution of Y, is shifted 
to the right of the distribution of X; is му — их (see Figure 15.1). In the remainder 
of this section, we define a shift model that applies for any distribution, normal or 
otherwise. 

Let Xj, X2, ..., Xn, be a random sample from a population with distribution 
function F(x) and let Yj, Y2,..., Y,, be a random sample from a population with 
distribution function G(y). If we wish to test whether the two populations have the 
same distribution—that is, Ho: F(z) = G(z) versus Ha: F(z) # G(z), with the 
actual form of F(z) and G(z) unspecified—a nonparametric method is required. 
Notice that H, is a very broad hypothesis. Many times, an experimenter may wish to 
consider the more specific alternative hypothesis that Y; has the same distribution as 
X, shifted by an (unknown) amount 0 (see Figure 15.2)—that is, that the distributions 
differ in location. Then, G(y) = P(Y| < y) = P(Xi < y —0) = F(y — 0) for 
some unknown parameter value 0. Notice that the particular form of F(x) remains 
unspecified. 

Throughout this chapter if we refer to the two-sample shift (location) model, we 
assume that X1, X2, ..., Xn, constitute a random sample from distribution function 
F(x) and that Y;, Yo,..., Ү„, constitute a random sample from distribution function 
G(y) — F(y — 0) for some unknown value 0. For the two-sample shift model, 
Ho: F(z) = G(z) is equivalent to Ho:0 = 0. If Ө is greater (less) than 0, then the 
distribution of the Y-values is located to the right (left) of the distribution of the 
X-values. 
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The Sign Test for a 
Matched-Pairs Experiment 


Suppose that we have n pairs of observations of the form (X;, Y;) and that we wish 
to test the hypothesis that the distribution of the X's is the same as that of the Y's 
versus the alternative that the distributions differ in location (see Section 15.2). Much 
as we did in Section 12.3, we let D; — X; — Y;. One of the simplest nonparametric 
tests is based on the signs of these differences and, reasonably enough, is called the 
sign test. Under the null hypothesis that X; and Y; come from the same continuous 
probability distributions, the probability that D; is positive is equal to 1/2 (as is 
the probability that D; is negative). Let M denote the total number of positive (or 
negative) differences. Then if the variables X; and Y; have the same distribution, M 
has a binomial distribution with p — 1/2, and the rejection region for a test based 
on M can be obtained by using the binomial probability distribution introduced in 
Chapter 3. The sign test is summarized as follows. 


The Sign Test for a Matched-Pairs Experiment 
[ШӘ jp == JP x y 
Null hypothesis: Ho: p — 1/2. 
Alternative hypothesis: H, : p > 1/2 or (p < 1/2 or p # 1/2). 
Test statistic: М = number of positive differences where D; = X; — Y;. 


Rejection region: For H,: p > 1/2, reject Ho for the largest values of M; 
for H, : p < 1/2, reject Ho for the smallest values of M; for H, : p # 1/2, 
reject Ho for very large or very small values of M. 


Assumptions: The pairs (X;, Y;) are randomly and independently selected. 


The following example illustrates the use of the sign test. 


EXAMPLE 15.1 


Solution 


The number of defective electrical fuses produced by each of two production lines, A 
and B, was recorded daily for a period of 10 days, with the results shown in Table 15.1. 
Assume that both production lines produced the same daily output. Compare the 
number of defectives produced by A and B each day and let M equal the number of 
days when A exceeded B. Do the data present sufficient evidence to indicate that either 
production line produces more defectives than the other? State the null hypothesis to 
be tested and use M as a test statistic. 


Pair the observations as they appear in the data tabulation and let M be the number of 
days that the observed number of defectives for production line A exceeds that for line 
B. Under the null hypothesis that the two distributions of defectives are identical, the 
probability p that A exceeds B for a given pair is p — .5, given that there are no ties. 
Consequently, the null hypothesis is equivalent to the hypothesis that the binomial 
parameter p — .5. 
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Table 15.1 Data for Example 15.1 


Day A B 
1 172 201 
2 165 179 
3 206 159 
4 184 192 
5 174 177 
6 142 170 
7 190 182 
8 169 179 
9 161 169 

10 200 210 


Very large or very small values of M are most contradictory to the null hypothesis. 
Therefore, the rejection region for the test will be located by including the most 
extreme values of M that at the same time provide a value of o that is suitable for 
the test. 

Suppose that we would like the value of o to be on the order of .05 or .10. We 
commence the selection of the rejection region by including M — 0 and M — 10 and 
calculate the o associated with this region, using p(y), the probability distribution 
for the binomial random variable (see Chapter 3). With n = 10, p = .5, we have 


a = p(0) + p(10) = (o yea” + („у = .002. 


Because this value of o is too small, the region will be expanded by including the next 
pair of M-values most contradictory to the null hypothesis, M = 1 and M = 9. The 
value of a for this region (M = 0, 1,9, 10) can be obtained from Table 1, Appendix 3: 


а = p(0) + p(1) + p(9) + p(10) = .022. 


This also is too small, so we again expand the region to include M = 0, 1, 2, 8, 9, 10. 
You can verify that the corresponding value of o is .11. Suppose that this value of o is 
acceptable to the experimenter; then we employ M = 0, 1, 2, 8, 9, 10 as the rejection 
region for the test. 

From the data, we observe that m = 2, so we reject the null hypothesis. We 
conclude that sufficient evidence exists to indicate that the population distributions 
for numbers of defective fuses are not identical. The probability of rejecting the null 
hypothesis when it is true is only о = .11, and we are therefore reasonably confident 
of our conclusion. 

The experimenter in this example is using the test procedure as a rough tool for 
detecting faulty production lines. The rather large value of o is not likely to disturb 
him because he can easily collect additional data if he is concerned about making a 
type I error in reaching his conclusion. L| 


Attained significance levels (p-values) for the sign test are calculated as outlined 
in Section 10.6. Specifically, if п = 15 and we wish to test Ho: p = 1/2 versus 
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H,:p < 1/2 based on the observed value of M = 3, Table | of Appendix 3 can be 
used to determine that (because п = 15, p = 1/2) 


p-value = P(M x 3) = .018. 
For the two-tailed test (Н, : p # 1/2), p-value = 2(.018) = .036. 


EXAMPLE 15.2 


Solution 


Find the p-value associated with the sign test performed in Example 15.1. 


The test in Example 15.1 is a two-tailed test of Ho: p = 1/2 versus На: p -# 1/2. 
The calculated value of M is т = 2, so the p-value is 2P(M < 2). Under the 
null hypothesis, M has a binomial distribution with n = 10, р = .5 and Table 1, 
Appendix 3, gives 


p-value = 2P(M < 2) = 2(.055) = .11. 


Thus, .11 is the smallest value of o for which the null hypothesis can be rejected. 
Notice that the p-value approach yields the same decision at that reached in Example 


15.1 where a formal о = .11 level test was used. However, the p-value approach 
eliminated the necessity of trying various rejection regions until we found one with a 
satisfactory value for o. L1 


One problem that may arise in connection with a sign test is that the observations 
associated with one or more pairs may be equal and therefore may result in ties. When 
this situation occurs, delete the tied pairs and reduce л, the total number of pairs. 

You will also encounter situations where n, the number of pairs, is large. Then, 
the values of œ associated with the sign test can be approximated by using the normal 
approximation to the binomial probability distribution discussed in Section 7.5. You 
can verify (by comparing exact probabilities with their approximations) that these 
approximations will be quite adequate for n as small as 10 or 15. This result is due to 
the symmetry of the binomial probability distribution for p — .5. For n 7 25, the Z 
test of Chapter 10 will suffice, where 
M-np M-n/2 

упра — 0/2) n. 

This statistic would be used for testing the null hypothesis p — .5 against the 
alternative p 4 .5 for a two-tailed test or against the alternative p > .5 (or p < .5) 
for a one-tailed test. The tests would use the familiar rejection regions of Chapter 10. 

The data of Example 15.1 are the result of a matched-pairs experiment. Suppose 
that the paired differences are normally distributed with a common variance o?. 
Will the sign test detect a shift in location of the two populations as effectively 
as the Student's ¢ test? Intuitively, we would suspect that the answer is no, and 
this is correct because the Student's ¢ test uses comparatively more information. 
In addition to giving the sign of the difference, the ¢ test uses the magnitudes of 
the observations to obtain more accurate values for sample means and variances. 
Thus, we might say that the sign test is not as "efficient" as the Student's ¢ test; but 
this statement is meaningful only if the populations conform to the assumption just 
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stated: The differences in paired observations are normally distributed with а соттоп 
variance 85; The sign test might be more efficient when these assumptions are not 
satisfied. 


Sign Test for Large Samples: n> 25 
Null hypothesis: Ho: p — .5 (neither treatment is preferred to the other). 


Alternative hypothesis: На: p 4 .5 for a two-tailed test (Note: We use the 
two-tailed test for an example. Many analyses require a one-tailed test.) 


Test statistic: Z = [M — n/2]/[(1/2) n]. 
Rejection region: Reject Но if z > „уз or if z < —zo/5?, where гоу is 
obtained from Table 3, Appendix 3. 


The sign test actually tests the null hypothesis that the median of the variables D; 
is Zero versus the alternative that it is different from zero. [The median of the variables 
Dj being zero does imply that P(D; < 0) = P(D; > 0).] If the variables X; and Y; 
have the same distribution, the median of the variables D; will be zero, as previously 
discussed. However, for models other than the shift model, there are other situations 
in which the median of the variables D; is zero. In these instances, the null hypothesis 
for the sign test is slightly more general than the statement that X; and Y; have the 
same distribution. 

Summarizing, the sign test is an easily applied nonparametric procedure for com- 
paring two populations. No assumptions are made concerning the underlying popu- 
lation distributions. The value of the test statistic can be obtained quickly by a visual 
count, and the rejection region (or p-value) can be found easily by using a table of 
binomial probabilities. Furthermore, we need not know the exact values of pairs of 
responses, just whether X; > Y; for each pair (X;, Y;). Exercise 15.5 provides an 
example of the use of the sign test for data of this sort. 


Exercises 


What significance levels between œ = .01 and а = .15 are available for a two-tailed sign test 
with 25 paired observations? (Make use of tabulated values in Table 1, Appendix 3, n = 25.) 
What are the corresponding rejection regions? 


A study reported in the American Journal of Public Health (Science News)—the first to follow 
lead levels in blood for law-abiding handgun hobbyists using indoor firing ranges—documents 
a considerable risk of lead poisoning.! Lead exposure measurements were made on 17 mem- 
bers of a law enforcement trainee class before, during, and after a 3-month period of firearm 
instruction at a state-owned indoor firing range. No trainees had elevated lead levels in their 
blood before training, but 15 of the 17 ended training with blood lead levels deemed “elevated” 
by the Occupational Safety and Health Administration (OSHA). Is there sufficient evidence to 
claim that indoor firing range use increases blood-level readings? 


1. Source: Science News, 136 (August 1989): 126. 
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a Give the associated p-value. 
What would you conclude at the о = .01 significance level? 


Use the normal approximation to give the approximate p-value. Does the normal approxi- 
mation appear to be adequate when n — 17? 


Clinical data concerning the effectiveness of two drugs for treating a disease were collected 
from ten hospitals. The number of patients treated with the drugs differed for the various 
hospitals. The data are given in the table that follows. 


Drug A Drug B 

Number Number Percentage Number Number Percentage 

Hospital Treated Recovered Recovered Treated Recovered Recovered 
1 84 63 75.0 96 82 85.4 
2 63 44 69.8 83 69 83.1 
3 56 48 85.7 91 73 80.2 
4 TI ЭТ 74.0 47 35 74.5 
5 29 20 69.0 60 42 70.0 
6 48 40 83.3 27 22 81.5 
7 61 42 68.9 69 52 75.4 
8 45 35 77.8 72 57 79.2 
9 79 57 12:2 89 76 85.4 
10 62 48 77.4 46 37 80.4 


а Do the data indicate a difference іп the recovery rates for the two drugs? Give the associated 
p-value. 


b Why might it be inappropriate to use the г test to analyze the data? 


For a comparison of the academic effectiveness of two junior high schools A and B, an experi- 
ment was designed using ten sets of identical twins, each twin having just completed the sixth 
grade. In each case, the twins in the same set had obtained their previous schooling in the same 
classrooms at each grade level. One child was selected at random from each set and assigned to 
school A. The other was sent to school B. Near the end of the ninth grade, an achievement test 
was given to each child in the experiment. The results are shown in the accompanying table. 


Twin Pair A B Twin Pair A B 
1 67 39 6 50 52 
2 80 75 7 63 56 
3 65 69 8 81 72 
4 70 55 9 86 89 
5 86 74 10 60 47 


a Using the sign test, test the hypothesis that the two schools are the same in academic 
effectiveness, as measured by scores on the achievement test, against the alternative that 
the schools are not equally effective. Give the attained significance level. What would you 
conclude with o — .05? 

b Suppose it is suspected that junior high school A has a superior faculty and better learning 
facilities. Test the hypothesis of equal academic effectiveness against the alternative that 
school A is superior. What is the p-value associated with this test? 


New food products are frequently subjected to taste tests by a panel of judges. The judges 
are usually asked to state a preference for one food over another so that no quantitative scale 
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need be employed. Suppose that two new mixtures, A and B, of an orange-flavored drink are 
presented to ten judges. The preferences of the judges are given in the accompanying table. 
Does this evidence indicate a significant difference between the tastes of A and B, at the 5% 
significance level? 


Judge Preference | Judge Preference 
1 A 6 A 
2 A 7 B 
3 A 8 A 
4 A 9 B 
5 A 10 A 


On clear, cold nights in the central Florida citrus region, the precise location of below-freezing 
temperatures is important because the methods of protecting trees from freezing conditions 
are very expensive. One method of locating likely cold spots is by relating temperature to 
elevation. It is conjectured that on calm nights the cold spots will be at low elevations. The 
highest and lowest spots in a particular grove yielded the minimum temperatures listed in the 
accompanying table for ten cold nights in a recent winter. 


Night High Elevation Low Elevation 


1 32.9 31.8 
2 33.2 31.9 
3 32.0 29.2 
4 33.1 33.2 
5 33.5 33.0 
6 34.6 33.9 
7 32.1 31.0 
8 33:1 32.5 
9 30.2 28.9 
10 29.1 28.0 


a Is there sufficient evidence to support the conjecture that low elevations tend to be colder? 
(Use the sign test. Give the associated p-value.) 


b Would it be reasonable to use a f test on the data? Why or why not? 


A psychological experiment was conducted to compare the lengths of response time (in seconds) 
for two different stimuli. To remove natural person-to-person variability in the responses, both 
stimuli were applied to each of nine subjects, thus permitting an analysis of the difference 
between response times within each person. The results are given in the following table. 


Subject Stimulus 1 Stimulus 2 


1 9.4 10.3 
2 7.8 8.9 
3 5.6 4.1 
4 12.1 14.7 
5 6.9 8.7 
6 4.2 7.1 
7 8.8 11.3 
8 77 5.2, 
9 6.4 7.8 
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a Use the sign test to determine whether sufficient evidence exists to indicate a difference in 
mean response for the two stimuli. Use a rejection region for which œ < .05. 


b Testthe hypothesis of no difference in mean response, using Student's f test. 


Refer to Exercise 12.15. Using the sign test, do you find sufficient evidence to support con- 
cluding that completion times differ for the two populations? Use a = .10. 


The data set in the accompanying table represents the number of industrial accidents in 12 
manufacturing plants for 1-week periods before and after an intensive promotion on safety. 


Plant Before After | Plant Before After 
1 3 2 7 5 3 
2 4 1 8 3 3 
3 6 3 9 2 0 
4 3 5 10 4 3 
5 4 4 11 4 1 
6 5 2 12 5 2 


a Do the data support the claim that the campaign was successful? What is the attained 
significance level? What would you conclude with o — .01? 

b Discuss the problems associated with a parametric analysis designed to answer the question 
in part (a). 


The Wilcoxon Signed-Rank Test 
for a Matched-Pairs Experiment 


As in Section 15.3, assume that we have п paired observations of the form (X;, Y;) 
and that D; — X; — Y;. Again we assume that we are interested in testing the hy- 
pothesis that the X's and the Y's have the same distribution versus the alternative 
that the distributions differ in location. Under the null hypothesis of no difference in 
the distributions of the X's and Y's, you would expect (on the average) half of the 
differences in pairs to be negative and half to be positive. That is, the expected number 
of negative differences between pairs is n/2 (where n is the number of pairs). Further, 
it would follow that positive and negative differences of equal absolute magnitude 
should occur with equal probability. If we were to order the differences according to 
their absolute values and rank them from smallest to largest, the expected rank sums 
for the negative and positive differences would be equal. Sizable differences in the 
sums of the ranks assigned to the positive and negative differences would provide 
evidence to indicate a shift in location for the two distributions. 

To carry out the Wilcoxon test, we calculate the differences (D;) for each of the n 
pairs. Differences equal to zero are eliminated, and the number of pairs, п, is reduced 
accordingly. Then we rank the absolute values of the differences, assigning a 1 to the 
smallest, a 2 to the second smallest, and so on. If two or more absolute differences are 
tied for the same rank, then the average of the ranks that would have been assigned 
to these differences is assigned to each member of the tied group. For example, 
if two absolute differences are tied for ranks 3 and 4, then each receives rank 3.5, and 
the next highest absolute difference is assigned rank 5. Then we calculate the sum 
of the ranks (rank sum) for the negative differences and also calculate the rank sum 
for the positive differences. For a two-tailed test, we use T, the smaller of these two 
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quantities, as a test statistic to test the null hypothesis that the two population relative 
frequency histograms are identical. The smaller the value of T is, the greater will be 
the weight of evidence favoring rejection of the null hypothesis. Hence, we will reject 
the null hypothesis if T is less than or equal to some value, say, То. 

To detect the one-sided alternative, that the distribution of the X's is shifted to the 
right of that of the Y’s, we use the rank sum Т of the negative differences, and we 
reject the null hypothesis for small values of Т, say, T^ < To. If we wish to detect a 
shift of the distribution of the Y's to the right of the X's, we use the rank sum T * of the 
positive differences as a test statistic, and we reject small values of T *, say, T^ < To. 

The probability that Т is less than or equal to some value Ту has been calculated for 
a combination of sample sizes and values of Tọ. These probabilities, given in Table 9, 
Appendix 3, can be used to find the rejection region for the test based on T. 

For example, suppose that you have п = 7 pairs and wish to conduct a two-tailed 
test of the null hypothesis that the two population relative frequency distributions are 
identical. Then, with о = .05, you would reject the null hypothesis for all values of 
T less than or equal to 2. The rejection region for the Wilcoxon rank-sum test for a 
paired experiment is always of this form: Reject the null hypothesis if T < Ту where 
To is the critical value for T. Bounds for the attained significance level (p-value) 
are determined as follows. For a two-tailed test, if Т = 3 is observed when n = 7, 
Table 9, Appendix 3, indicates that Ho would be rejected if a = .1, but not if a = .05. 
Thus, .05 < p-value < .1. For the one-sided alternative that the X's are shifted to 
the right of ће Y's with n = 7 anda = .05, Но is rejected if T = T^ < 4. In this 
case, if T = T~ = 1, then .01 < p-value < .025. The test based on Т, called the 
Wilcoxon signed-rank test, is summarized as follows. 


Wilcoxon Signed-Rank Test for a Matched-Pairs Experiment 


Ho: The population distributions for the X’s and Y's are identical. 

H,: (1) The two population distributions differ in location (two-tailed), 
or (2) the population relative frequency distribution for the X's is 
shifted to the right of that for the Y's (one-tailed). 


Test statistic: 


1. Fora two-tailed test, use Т = min(T*, T~), where T * = sum of the 
ranks of the positive differences and T~ = sum of the ranks of the 
negative differences. 

2. Foraone-tailed test (to detect the one-tailed alternative just given), use 
the rank sum T~ of the negative differences.” 


Rejection region: 


1. For a two-tailed test, reject Ho if T < To, where То is the critical value 
for the two-sided test given in Table 9, Appendix 3. 

2. Бога one-tailed test (as described earlier), reject Ho if T~ < To, where 
To is the critical value for the one-sided test. 


2. То detect a shift of the distribution of the Y's to the right of the distribution of the X's, use the rank 
sum 7+, the sum of the ranks of the positive differences, and reject Ho if T* < Ту. 
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EXAMPLE 15.3 


Solution 


Due to oven-to-oven variation, a matched-pairs experiment was used to test for dif- 
ferences in cakes prepared using mix A and mix B. Two cakes, one prepared using 
each mix, were baked in each of six different ovens (a total of 12 cakes). Test the 
hypothesis that there is no difference in population distributions of cake densities 
using the two mixes. What can be said about the attained significance level? 


The original data and differences in densities (in ounces per cubic inch) for the six 
pairs of cakes are shown in Table 15.2. 

As with our other nonparametric tests, the null hypothesis to be tested is that the 
two population frequency distributions of cake densities are identical. The alternative 
hypothesis is that the distributions differ in location, which implies that a two-tailed 
test is required. 

Because the amount of data is small, we will conduct our test by using a = .10. 
From Table 9, Appendix 3, the critical value of T for a two-tailed test, a = .10, is 
To — 2. Hence, we will reject Ho if T « 2. 

There is only one positive difference, and that difference has rank 3; therefore, 
T* = 3. Because Tt + TT = n(n + 1)/2 (why?), T~ = 21 — 3 = 18 and the 
observed value of Т is min(3, 18) = 3. Notice that 3 exceeds the critical value of 
T, implying that there is insufficient evidence to indicate a difference in the two 
population frequency distributions of cake densities. Because we cannot reject Ho for 
a = .10, we can only say that p-value > .10. 


Table 15.2 Paired data and their differences for Example 15.3 


Difference, Absolute Rank of 

A B A—B Difference Absolute Difference 
.135 .129 .006 .006 3 
102; 120 —.018 ‚018 
.108  .112 —.004 .004 1.5 
41 152 —.011 O11 
.131  .135 —.004 .004 1:5 
Л44  .163 —.019 .019 6 m 


Although Table 9, Appendix 3, is applicable for values of n (the number of data 
pairs) as large as п = 50, it is worth noting that T* (or T~) will be approxi- 
mately normally distributed when the null hypothesis is true and n is large (say, 
25 ог more). This enables us to construct a large-sample Z test, where if T = TT, 


n(n + 1)2n + 1) 


BU) = 24 


= апа V(T*)- 


Then Ше Z statistic 


Feta) _ Т*—[л(0@+1)/4] 
© A4V(T Jn +1)0л+1)/24 
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can be used as a test statistic. Thus, for a two-tailed test апа о = .05, we would reject 
the hypothesis of identical population distributions when |z| > 1.96. For a one-tailed 
test that the distribution of the X’s is shifted to the right (left) of the distribution of 
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the Y’s, reject Ну when z > z, (Z < —z,). 


A Large-Sample Wilcoxon Signed-Rank Test for a Matched-Pairs 
Experiment: n > 25 


Null hypothesis: Ho : The population relative frequency distributions for 
the X's and Y's are identical. 

Alternative hypothesis: (1) H, : The two population relative frequency dis- 
tributions differ in location (a two-tailed test), 

or (2) the population relative frequency distribution for the X’s is shifted to 
the right (or left) of the relative frequency distribution of the Y s (one-tailed 
tests). 


T = le i D 
vnn + DOn + D/24 
Rejection region: Reject Ho if z > Z«/2 or < —zay for a two-tailed test. 
To detect a shift in the distributions of the X’s to the right of the Y’s, reject 


Ho when z > z,. To detect a shift in the opposite direction, reject Ho if 
© = Shei 


Test statistic: Z = 


Exercises 


If a matched-pairs experiment using n pair of observations is conducted, if T* = the sum of 
the ranks of the absolute values of the positive differences, and T~ = the sum of the ranks of 


the absolute values of the negative differences, why is T^ + T^ = n(n + 1)/2? 


Refer to Exercise 15.10. If Т+ has been calculated, what is the easiest way to determine the 


value of T^? If T+ > n(n + 1)/4, is T = ТТ or Т? Why? 


The accompanying table gives the scores of a group of 15 students in mathematics and art. 


1 


Student Math Art | Student Math Art 
22 53 9 62 55 
37 68 10 65 74 
36 42 11 66 68 
38 49 12 56 64 
42 51 13 66 67 
58 65 14 67 73 
58 51 15 62 65 
60 71 


со 3 С\ tA BW м 


a Use Wilcoxon's signed-rank test to determine if the locations of the distributions of scores 
for these students differ significantly for the two subjects. Give bounds for the p-value and 


indicate the appropriate conclusion with o = .05. 


b State specific null and alternative hypotheses for the test that you conducted in part (a). 
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Refer to Exercise 15.4. What answers are obtained if Wilcoxon's signed-rank test is used in 
analyzing the data? Compare these answers with the answers obtained in Exercise 15.4. 


Refer to Exercise 15.6(a). Answer the question by using the Wilcoxon signed-rank test. 


Eight subjects were asked to perform a simple puzzle-assembly task under customary conditions 
and under conditions of stress. During the stressful condition, the subjects were told that a 
mild shock would be delivered 3 minutes after the start of the experiment and every 
30 seconds thereafter until the task was completed. Blood pressure readings were taken 
under both conditions. Data in the accompanying table represent the highest reading during 
the experiment. 


Subject Normal Stress 
1 126 130 
2 117 118 
3 115 125 
4 118 120 
Э 118 121 
6 128 125 
7 125 130 
8 120 120 


Do the data present sufficient evidence to indicate higher-blood pressure readings during 
conditions of stress? Analyze the data by using the Wilcoxon signed-rank test for a matched- 
pairs experiment. Give the appropriate p-value. 


Two methods, А and B, for controlling traffic were employed at each of п = 12 intersections 
for a period of 1 week. The numbers of accidents occurring during this time period are recorded 
in the following table. The order of use (which method was employed for the first week) was 
randomly chosen for each intersection. 


Method Method Method Method 
Intersection A B Intersection A B 
1 5 4 7 2 3 
2 6 4 8 4 1 
3 8 9 9 7 9 
4 3 2 10 5 2 
5 6 3 11 6 5 
6 1 0 12 1 1 


a Analyze these data using the sign test. 


b Analyze these data using the Wilcoxon signed-rank test for a matched-pairs experiment. 


Dental researchers have developed a new material for preventing cavities, a plastic sealant that 
is applied to the chewing surfaces of teeth. To determine whether the sealant is effective, it 
was applied to half of the teeth of each of 12 school-age children. After 2 years, the number 
of cavities in the sealant-coated teeth and in the untreated teeth were counted. The results are 
given in the accompanying table. Is there sufficient evidence to indicate that sealant-coated 
teeth are less prone to cavities than are untreated teeth? Test using a = 0.05. 
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Child Sealant-Coated Untreated | Child Sealant-Coated Untreated 
1 3 3 7 1 5 
2 1 3 8 2 0 
3 0 2 9 1 6 
4 4 5 10 0 0 
5 1 0 11 0 3 
6 0 1 12 4 3 


Refer to Exercise 12.16. With о = .01, use the Wilcoxon signed-rank test to see if there was а 
significant loss in muck depth between the beginning and end of the study. 


Suppose that Y;, Y2,..., Y, is a random sample from a continuous distribution function F (y). 
It is desired to test a hypothesis concerning the median & of F (y). Construct a test of Ho : E = £o 
against H, : E + £o, where & is a specified constant. 


a Usethe sign test. 
b Use the Wilcoxon signed-rank test. 


The spokesperson for an organization supporting property-tax reductions in a certain section 
of a city stated that the median annual income for household heads in that section was $15,000. 
A random sample of ten household heads from that section revealed the following annual 
incomes: 


14,800 16,900 18,000 19,100 13,200 
18,500 20,000 19,200 15,100 16,500 


With a = .10, test the hypothesis that the median income for the population from that section 
is $15,000 against the alternative that it is greater than $15,000. 


a Use the sign test. 
b Use the Wilcoxon signed-rank test. 


Using Ranks for Comparing 
Two Population Distributions: 
Independent Random Samples 


A Statistical test for comparing two populations based on independent random sam- 
ples, the rank-sum test, was proposed by Frank Wilcoxon in 1945. Again, we assume 
that we are interested in testing whether the two populations have the same distribu- 
tion versus the shift (or location) alternative (see Section 15.2). Suppose that you were 
to select independent random samples of n, and n» observations from populations I 
and II, respectively. Wilcoxon's idea was to combine the nı + n2 = n observations 
and rank them, in order of magnitude, from 1 (the smallest) to и (the largest). Ties 
are treated as in Section 15.4. That is, if two or more observations are tied for the 
same rank, the average of the ranks that would have been assigned to these observa- 
tions is assigned to each member of the tied group. If the observations were selected 
from identical populations, the rank sums for the samples should be more or less 
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proportional to the sample sizes n; and n2. For example, if nı and n» were equal, you 
would expect the rank sums to be nearly equal. In contrast, if the observations in one 
population—say, population I—tended to be larger than those in population II, the 
observations in sample I would tend to receive the highest ranks and sample I would 
have a larger than expected rank sum. Thus (sample sizes being equal), if one rank 
sum is very large (and, correspondingly, the other is very small), it may indicate a 
statistically significant difference between the locations of the two populations. 

Mann and Whitney proposed an equivalent statistical test in 1947 that also used the 
rank sums of two samples. Because the Mann-Whitney U test and tables of critical 
values of U occur so often in the literature, we will explain its use in Section 15.6 and 
will give several examples of its applications. In this section, we illustrate the logic 
of the rank-sum test and demonstrate how to determine the rejection region for the 
test and the value of o. 


EXAMPLE 15.4 


Solution 


The bacteria counts per unit volume are shown in Table 15.3 for two types of cultures, 
I and П. Four observations were made for each culture. Let nı and пә represent the 
number of observations in samples I and II, respectively. 

Forthe data given in Table 15.3, the corresponding ranks are as shown in Table 15.4. 
Do these data present sufficient evidence to indicate a difference in the locations of 
the population distributions for cultures I and II? 


Table 15.3 Data for Example 15.4 


I II 

27 32 
31 29 
26 35 
25 28 


Let W equal the rank sum for sample I (for this sample, W — 12). Certainly, very 
small or very large values of W provide evidence to indicate a difference between the 
locations of the two population distributions; hence W, the rank sum, can be employed 
as a test statistic. 

The rejection region for a given test is obtained in the same manner as for the sign 
test. We start by selecting the most contradictory values of W as the rejection region 
and add to these until о is of acceptable size. 


Table 15.4 Ranks 


I П 
3 7 
6 5 
2 8 
1 4 


Rank Sum 12 24 
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The minimum rank sum includes the ranks 1, 2, 3, 4, or W — 10. Similarly, the 
maximum includes the ranks 5, 6, 7, 8, with W = 26. Therefore, we include these 
two values of W in the rejection region. What is the corresponding value of o? 

Finding the value of о is a probability problem that can be solved by using the 
methods of Chapter 2. If the populations are identical, every permutation of the eight 
ranks represents a sample point and is equally likely. Then, o is the sum of the 
probabilities of the sample points (arrangements) that imply W = 10 or W = 26. 
The total number of permutations of the eight ranks is 8! The number of different 
arrangements of the ranks 1, 2, 3, 4 in sample I with the 5, 6, 7, 8 of sample II is 
4! x 4!. Similarly, the number of arrangements that place the maximum value of W 
in sample I (ranks 5, 6, 7, 8) is 4! x 4!. Then, the probability that W — 10 or W — 26 
is 


DUDUD 2 1 


8! (5 7 35 


p(10) + p(26) = = 029: 


If this value of o is too small, the rejection region can be enlarged to include 
the next smallest and next largest rank sums, W = 11 and W = 25. The rank sum 
W = 11 includes the ranks 1, 2, 3, 5, and 


4\4! 1 


== 
РЇ) = 7g = 50 


Similarly, 


1 
25) = —. 
р(25) 70 


Then, 


2 
a = p(10) + p(11) + pQ5) + pQ6) = = = .057. 


Expansion of the rejection region to include 12 and 24 substantially increases the 
value of о. The set of sample points giving a rank of 12 includes all sample points 
associated with rankings of (1, 2, 3, 6) and (1, 2, 4, 5). Thus, 

(2)(41)(4) 1 


Bec e E 
pom 8! 35 


and 


a = р(10) + p(11) + p(12) + р(24) + pQ5) + p(26) 


Е СЕЗЕ ЕИ ыр са РЕТ 7, 
70 70 35 35 70 70 3 ` ` 


This value of о might be considered too large for practical purposes. Hence, we are 
better satisfied with the rejection region W = 10, 11, 25, and 26. 

The rank sum for the sample, W = 12, does not fall in this preferred rejection re- 
gion, so we do not have sufficient evidence to reject the hypothesis that the population 
distributions of bacteria counts for the two cultures are identical. 
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The Mann-Whitney U Test: 
Independent Random Samples 


The Mann-Whitney statistic U is obtained by ordering all (nı + n2) observations 
according to their magnitude and counting the number of observations in sample 
I that precede each observation in sample II. The statistic U is the sum of these 
counts. In the remainder of this section, we denote the observations in sample I as 
X1, X2, ..., Xn, and the observations in sample II as yi, yo. .... yn. 

For example, the eight ordered observations of Example 15.4 are 


25 26 27 28 29 31 32 35 
Ха) Хо) X3 Уа) Уо ха) Уз Уа) 
The smallest y observation is ya) = 28, and иу = 3 x’s precede it. Similarly, 
u2 = 3 x’s precede yo, = 29 and из = 4, and u4 = 4 x’s precede узу = 32 and 
Уус) = 35, respectively. Then, 


U = uj +и us c u4 — 334-444 14. 


Very large or very small values of U imply a separation of the ordered x's and y's 
and thus provide evidence to indicate a difference (a shift of location) between the 
distributions of populations I and II. 

As noted in Section 15.5, the Mann—Whitney U statistic is related to Wilcoxon’s 
rank sum. In fact, it can be shown (Exercise 15.75) that 


Formula for the Mann-Whitney U Statistic 
m Tl. | 
2 
where n, = number of observations in sample I, 
n = number of observations in sample П, 
W = rank sum for sample I. 


U = nin; + W, 


As you can see from the formula for U, U is small when W is large, a situation 
likely to occur when the distribution of population I is shifted to the right of the 
distribution of population II. Consequently, to conduct a one-tailed test to detect a 
shift in the distribution of population I to the right of the distribution of population 
II, you will reject the null hypothesis of no difference in population distributions if 
U < Uo, where a = P(U < 00) is of suitable size. 

Some useful results about the distribution of U: 


1. The possible values of U are 0, 1, 2,..., nin. 

2. The distribution of U is symmetric about (nın2)/2. That is, for any a > 0, 
P[U < (nin2)/2 — a] = P[U > (nin2)/2 + a]. 

3. The result in (2) implies that P(U < Uo) = P(U > nım — Uo). 


If you wish to conduct a one-tailed test to detect a shift of the distribution of population 
I to the left of distribution of population IL, you would reject Ho if U is very large, 
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specifically if U > njn» — Uo, where Up is such that a = P(U > nım — Ug) = 
P(U < Оо) is of acceptable size. 

Table 8, Appendix 3, gives the probability that an observed value of U is less than 
various values, Uo. This is the value of o for a one-tailed test. To conduct a two-tailed 
test—that is, to detect difference in the locations of populations I and II—reject Ho 
if U < Uj or U > nın — Uo, where P(U < Uo) = 0/2. 

To see how to locate the rejection region for the Mann-Whitney U test, suppose that 
пу = 4 and m = 5. Then, you would consult the third section of Table 8, Appendix 3 
(the one corresponding to n2 = 5). Notice that the table is constructed assuming that 
nı < m. That is, you must always identify the smaller sample as sample I. From the 
table we see, for example, P(U < 2) = .0317 and P(U x 3) = .0556. So if you 
want to conduct а lower-tail Mann-Whitney U test with n; = 4 and n? = 5 fora near 
.05, you should reject the null hypothesis of equality of population relative frequency 
distributions when U < 3. The probability of a type I error for the test is о = .0556. 

When applying the test to a set of data, you may find that some of the observations 
are of equal value. Ties in the observations can be handled by averaging the ranks 
that would have been assigned to the tied observations and assigning this average to 
each. Thus, if three observations are tied and are due to receive ranks 3, 4, and 5, we 
assign rank 4 to all three. The next observation in the sequence receives rank 6, and 
ranks 3 and 5 do not appear. Similarly, if two observations are tied for ranks 3 and 4, 
each receives rank 3.5, and ranks 3 and 4 do not appear. 

Table 8, Appendix 3, can also be used to find the observed significance level for a 
test. For example, if n; = 5, n; = 5, and U = 4, the p-value for a one-tailed test that 
the distribution of population I is shifted the right of the distribution of population II is 


P(U x 4) = .0476. 
If the test is two-tailed, the p-value is 
2(.0476), or .0952. 


The Mann-Whitney U Test 
Population I is the population from which the smaller sample was taken. 


Null hypothesis: Ho: The distributions of populations I and П are identical. 
Alternative hypothesis: (1) H,: The distributions of populations I and П 
have different locations (a two-tailed test), 

or (2) the distribution of population I is shifted to the right of the distribution 
of population II, or (3) the distribution of population I is shifted to the left 
of the distribution of population II. 

Test statistic: U = пуп» + [ni (n, + 1)]/2 — W. 

Rejection region: (1) For the two-tailed test and a given value of o, reject 
HoifU < UgorU > пуп – Uo, where P(U < Uo) = 0/2. [Note: Observe 
that Up is the value such that P(U < Up) is equal to half of o.] 

(2) To test that population I is shifted to the right of population II with a given 
value 
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of a, reject Ho if U < Uo, where P(U < Uo) =a. 

(3) To test that population I is shifted to the left of population II with a given 
value of a, reject Ho if U > nin» — Uo, where P(U < Uo) = а. 
Assumptions: Samples have been randomly and independently selected 
from their respective populations. Ties in the observations can be handled 
by averaging the ranks that would have been assigned to the tied observations 
and assigning this average rank to each. Thus, if three observations are tied 
and are due to receive ranks 3, 4, and 5, we assign rank 4 to all three. 


EXAMPLE 15.5 


Solution 


Test the hypothesis that there is no difference in the locations of the population 
distributions for the bacteria count data of Example 15.4. 


We have already noted that the Mann-Whitney U test and the Wilcoxon rank-sum 
test are equivalent, so we should reach the same conclusions here as we did in Exam- 
ple 15.4. Recall that the alternative hypothesis was that the distributions of bacteria 
counts for cultures I and II differed and that this implied a two-tailed test. Thus, 
because Table 8, Appendix 3, gives values of P(U < Up) for specified sample 
sizes and values of Uo, we must double the tabulated value to find o. Suppose, as in 
Example 15.4, that we desire a value of o near .05. Checking Table 8 for n; — n» — 4, 
we find P(U < 1) = .0286. The appropriate rejection region for the two-tailed test 
is U < lor > nım — 1 = 16 — 1 = 15, for which a = 2(.0286) = .0572 
or, rounding to three decimal places, œ = .057 (the same value of о obtained for 
Example 15.4). 
For the bacteria data, the rank sum is W — 12. Then, 


n(n; + 1) B 4(4 + 1) 
Cg “=ч о 


U = пт + 12 = 14. 

The calculated value of U does not fall in the rejection region. Hence, there is not 
sufficient evidence to show a difference in the locations of the population distributions 
of bacteria counts for cultures I and II. The p-value is given by 2P(U > 14) = 
2P(U < 2) = 2 (.0571) = .1142. ш 


EXAMPLE 15.6 


Solution 


An experiment was conducted to compare the strengths of two types of kraft papers, 
one a standard kraft paper of a specified weight and the other the same standard kraft 
paper treated with a chemical substance. Ten pieces of each type of paper, randomly 
selected from production, produced the strength measurements shown in Table 15.5. 
Test the hypothesis of no difference in the distributions of strengths for the two types 
of paper against the alternative hypothesis that the treated paper tends to be stronger. 


Both samples are of size 10, so either population (standard or treated) may be desig- 
nated as population I. We have identified the standard paper measurements as coming 
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Table 15.5 Data for Example 15.6 


Standard, I Treated, II 
1.21 (2) 1.49 (15) 
1.43 (12) 1.37 (7.5) 
1.35 (6) 1.67 (20) 
1.51 (17) 1.50 (16) 
1.39 (9) 1.31 (5) 
1.17 (1) 1.29 (3.5) 
1.48 (14) 1.52 (18) 
1.42 (11) 1:37 (7.5) 
1.29 (3.5) 1.44 (13) 
1.40 (10) 1.53 (19) 


Rank Sum W = 85.5 


from population I. In Table 15.5, the ranks are shown in parentheses alongside the 
пу + n, = 10+ 10 = 20 strength measurements, and the rank sum W is given below 
the first column. Because we wish to detect a shift in the distribution of population I 
(standard) to the left of the distribution of the population II (treated), we will reject 
the null hypothesis of no difference in population strength distributions when W is 
excessively small. Because this situation occurs when U is large, we will conduct a 
one-tailed statistical test and reject the null hypothesis when U > nın — Uo. 
Suppose that we choose a value of o near .05. Then we can find Up by consulting 
the portion of Table 8, Appendix 3, corresponding to n? = 10. The probability 
P(U < Шу) nearest .05 is .0526 and corresponds to Up = 28. Hence, we will reject 
if U > (10)(10) — 28 = 72. 
Calculating U, we have 
n(n; + 1) 
2 
As you can see, U is not greater than 72. Therefore, we cannot reject the null hy- 
pothesis. At the о = .0526 level of significance, there is not sufficient evidence to 


indicate that the treated kraft paper is stronger than the standard. The p-value is given 
by P(U > 69.5) = P(U x 30.5) = .0716. a 


U = пп + 85.5 = 69.5. 


W = (10)(10) + ow = 


A simplified large-sample test (n; > 10 and n2 > 10) can be obtained by using 
the familiar Z statistic of Chapter 10. When the population distributions are identical, 
it can be shown that the U statistic has the following expected value and variance: 


1 
a aud vy) = fitr met ) 


E(U) — 


Also, when n; and nz are large, 
Eu E(U) 


Ou 


Z 
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has approximately a standard normal distribution. This approximation is adequate 
when n; and пә both are greater than or equal to 10. Thus, for a two-tailed test with 
a = .05, we will reject the null hypothesis if |z| > 1.96. 

The Z statistic yields the same conclusion as the exact U test for Example 15.6: 


Е 69.5 — [(10)(10)/2] _ 69.5 — 50 Е 19.5 
ао Сото 012 0210072 17175 
= кз = 1.47. 
13.23 


For a one-tailed test with о = .05 located in the upper tail of the z distribution, we 
will reject the null hypothesis if z > 1.645. You can see that z = 1.47 does not fall 
in the rejection region and that this test reaches the same conclusion as the exact U 
test of Example 15.6. 


The Mann-Whitney U Test for Large Samples-n; > 10 and m > 10 


Null hypothesis: Ho: The relative frequency distributions for populations 
I and II are identical. 


Alternative hypothesis: (1) Ha: The two populations’ relative frequency 
distributions differ in location (a two-tailed test), 


or (2) the relative frequency distribution for population I is shifted to the 
right (or left) of the relative frequency distribution for population II (a one- 
tailed test). 

U — (njn2/2) 
МСМпүпә(тү + пә + 1/12 
Rejection region: Reject Ho if z > 24/2 Or < —Zq/2 for a two-tailed test. 
For a one-tailed test, place all o in one tail of the z distribution. To detect 
a shift in the distribution of population I to the right of the distribution of 
population II, reject Но when z < —Z,. To detect a shift in the opposite 
direction, reject Но when z > Za. Tabulated values of z are given in Table 4, 
Appendix 3. 


Test statistic: Z — 


It may seem to you that the Mann-Whitney U test and the equivalent Wilcoxon 
rank-sum test are not very efficient because they do not appear to use all the informa- 
tion in the sample. Actually, theoretical studies have shown that this is not the case. 
Suppose, for example, that all of the assumptions for a two-sample ¢ test are met 
when testing Ho: ш — ио = О versus Ha : ш — u2 > 0. Because the two-sample t 
test simply tests for a difference in location (see Section 15.2), we can use the Mann- 
Whitney U statistic to test these same hypotheses. For a given o and В, the total 
sample size required for the t test is approximately .95 times the total sample size 
required for the Mann-Whitney U. Thus, the nonparametric procedure is almost as 
good as the / test for the situation in which the ¢ test is optimal. For many non- 
normal distributions, the nonparametric procedure requires fewer observations than 
а corresponding parametric procedure would require to produce the same values 
of œ and p. 


15.21 


15.22 


15.23 


15.24 


Exercises 763 


Exercises 


Find the p-values associated with each of the following scenarios for testing Ну: populations 
I and II have the same distribution. 


а H,: distribution of population I is shifted to the right of the distribution of population П; 
n; —4, п —7, № = 34. 

b H,: distribution of population I is shifted to the left of the distribution of population II; 
пр = 5, п = 9, W=38. 

с Ha: populations I and П differ in location; nj = 3, по = 6, W = 23. 


In some tests of healthy, elderly men, a new drug has restored their memories almost to the level 
of young adults. The medication will soon be tested on patients with Alzheimer’s disease, the 
fatal brain disorder that eventually destroys the minds of those afflicted. According to Dr. Gary 
Lynch of the University of California, Irvine, the drug, called ampakine CX-516, accelerates 
signals between brain cells and appears to significantly sharpen memory.? In a preliminary test 
on students in their early 20s and on men aged 65—70, the results were particularly striking. The 
accompanying data are the numbers of nonsense syllables recalled after 5 minutes for ten men 
in their 20s and ten men aged 65—70 who had been given a mild dose of ampakine CX-516. 
Do the data provide sufficient evidence to conclude that there is a difference in the number of 
nonsense syllables recalled by men in the two age groups when older men have been given 
ampakine CX-516? Give the associated p-value. 


Age Group Number of syllables recalled 
20s 1 7 6 8 6 9 2 10 3 6 
65—70 19 6 8 7 8 5 7 10 3 


(with ampakine CX-516) 


Two plastics, each produced by a different process, were tested for ultimate strength. The 
measurements in the accompanying table represent breaking loads in units of 1000 pounds 
per square inch. Do the data present evidence of a difference between the locations of the 
distributions of ultimate strengths for the two plastics? Test by using the Mann—Whitney U 
test with a level of significance as near as possible to a = .10. 


Plastic 1 Plastic 2 


15.3 21.2 
18.7 22.4 
22.3 18.3 
17.6 19.3 
19.1 17.1 
14.8 27.7 


The coded values for a measure of brightness in paper (light reflectivity), prepared by two 
different processes, are as shown in the accompanying table for samples of size 9 drawn 
randomly from each of the two processes. Do the data present sufficient evidence to indicate 
a difference in locations of brightness measurements for the two processes? Give the attained 
significance level. 


3. Source: “Alzheimer’s Test Set for New Memory Drug,” Press Enterprise (Riverside, Calif.), 18 
November 1997, p. A-4. 
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A B 


6.1 9:1 
9.2 8.2 
8.7 8.6 
8.9 6.9 
7.6 7.5 
7.1 4:9 
9.5 8.3 
8.3 7.8 


9.0 8.9 


a Use the Mann-Whitney U test. 
Use Student's ¢ test. 


Give specific null and alternative hypotheses, along with any assumptions, for the tests 
used in parts (a) and (b). 


Fifteen experimental batteries were selected at random from a lot at pilot plant A, and 15 
standard batteries were selected at random from production at plant B. АП 30 batteries were 
simultaneously placed under an electrical load of the same magnitude. The first battery to fail 
was an А, the second a B, the third a B, and so on. The following sequence shows the order of 
failure for the 30 batteries: 

А В BB AB AA B B BB А BA 

В В В В А А В AA AB А А A A 
Using the large-sample theory for the U test, determine whether there is sufficient evidence to 
permit the experimenter to conclude that the lengths of life for the experimental batteries tend 


to be greater than the lengths of life for the standard batteries. Use о = .05. 


Refer to Exercises 8.88 and 8.89. Is there sufficient evidence to indicate a difference in the 
populations of LC50 measurements for DDT and Diazinon? What is the attained significance 
level associated with the U statistic. What do you conclude when a = .10? 


Given below are wing stroke frequencies* for samples of two species of Euglossine bees. Four 
bees of the species Euglossa mandibularis Friese and six of the species Euglossa imperialis 
Cockerell are shown in the accompanying table. 


Wing Stroke Frequencies 


E. mandibularis Friese E. imperialis Cockerell 
235 180 
225 169 
190 180 
188 185 
178 
183 


4. Source: T. M. Casey, M. L. May, and К. К. Morgan, “Flight Energetics of Euglossine Bees in Relation 
to Morphology and Wing Stroke Frequency,” Journal of Experimental Biology 116 (1985). 
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a Do the data present sufficient evidence to indicate that the distributions of wing stroke 
frequencies differ for the two species? Use the test based on the Mann-Whitney U statistic 
with о as close to, but not exceeding, .10. 


b Give the approximate p-value associated with the test. 


15.28 Cancer treatment using chemotherapy employs chemicals that kill both cancer cells and normal 
cells. In some instances, the toxicity of the cancer drug—that is, its effect on normal cells— 
can be reduced by the simultaneous injection of a second drug. A study was conducted to 
determine whether a particular drug injection was beneficial in reducing the harmful effects 
of a chemotherapy treatment on the survival time for rats. Two randomly selected groups of 
rats, 12 rats in each group, were used for the experiment. Both groups, call them A and B, 
received the toxic drug in a dosage large enough to cause death, but group B also received 
the antitoxin that was intended to reduce the toxic effect of the chemotherapy on normal cells. 
The test was terminated at the end of 20 days, or 480 hours. The lengths of survival time for 
the two groups of rats, to the nearest 4 hours, are shown in the following table. Do the data 
provide sufficient evidence to indicate that rats receiving the antitoxin tended to survive longer 
after chemotherapy than those not receiving the antitoxin? Use the Mann—Whitney U test with 
a value of o near .05. 


Only Chemotherapy (A) Chemotherapy plus Drug (B) 


84 140 
128 184 
168 368 
92 96 
184 480 
92 188 
76 480 
104 244 
72 440 
180 380 
144 480 
120 196 


15.7 The Kruskal-Wallis Test 
for the One-Way Layout 


In Section 13.3, we presented an analysis of variance (ANOVA) procedure to com- 
pare the means of k populations. The resultant F test was based on the assumption 
that independent random samples were taken from normal populations with equal 
variances. That is, as discussed in Section 15.2, we were interested in testing whether 
all the populations had the same distribution versus the alternative that the popula- 
tions differed in location. A key element in the development of the procedure was the 
quantity identified as the sum of squares for treatments, SST. As we pointed out in the 
discussion in Section 13.3, the larger the value of SST, the greater will be the weight 
of evidence favoring rejection of the null hypothesis that the means are all equal. In 
this section, we present a nonparametric technique to test whether the populations 
differ in location. Like the other nonparametric techniques discussed in this chapter, 
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the Kruskal-Wallis procedure requires no assumptions about the actual form of the 
probability distributions. 

As in Section 13.3, we assume that independent random samples have been drawn 
from k populations that differ only in location. However, we need not assume that 
these populations possess normal distributions. For complete generality, we permit 
the sample sizes to be unequal, and we let n;, fori = 1, 2,..., К, represent the 
size of the sample drawn from the ith population. Analogously to the procedure of 
Section 15.5, combine all the n, 4-715 +- - - -- n, = n observations and rank them from 
1 (the smallest) to n (the largest). Ties are treated as in previous sections. That is, if 
two or more observations are tied for the same rank, then the average of the ranks that 
would have been assigned to these observations is assigned to each member of the 
tied group. Let Ё; denote the sum of the ranks of the observations from population 
i and let R; = R;/n; denote the corresponding average of the ranks. If А equals 
the overall average of all of the ranks, consider the rank analogue of SST, which is 
computed by using the ranks rather than the actual values of the measurements: 


k 
y- Yn (Е; = RP, 

i=l 
If the null hypothesis is true and the populations do not differ in location, we would 
expect the R; values to be approximately equal and the resulting value of V to be 
relatively small. If the alternative hypothesis is true, we would expect this to be 
exhibited in differences among the values of the R; values, leading to a large value 
for V. Notice that R = (sum of the first n integers)/n = [n(n + 1)/2]/n = (n+ 1)/2 


and thus that 
k ae es 2 
у = i Ri — А 
(6-5) 


i=l 
Instead of focusing on V, Kruskal and Wallis (1952) considered the statistic H = 
12V/[n(n + 1)], which may be rewritten (see Exercise 15.35) as 


i ы? 
Н = 1-3 1). 
ЕП сщ Vir 


As previously noted, the null hypothesis of equal locations is rejected in favor of the 
alternative that the populations differ in location if the value of H is large. Thus, the 
corresponding a -level test calls for rejection of the null hypothesis in favor of the al- 
ternative if H > h(a), where Л (о) is such that, when Ho is true, P[H > h(a)] =a. 

If the underlying distributions are continuous and if there are no ties among the n 
observations, the null distribution of H can (tediously) be found by using the methods 
of Chapter 2. We can find the distribution of H for any values of k апал, n5, ..., nk 
by calculating the value of H for each of the n! equally likely permutations of the ranks 
of the n observations (see Exercise 15.36). These calculations have been performed 
and tables developed for some relatively small values of k and for nı, n2, ..., ng [see, 
for example, Table A.12 of Hollander and Wolfe (1999)]. 

Kruskal and Wallis showed that if the n; values are “large” the null distribution of 
H can be approximated by a x? distribution with k — 1 degrees of freedom (df). This 
approximation is generally accepted to be adequate if each of the п; values is greater 
than or equal to 5. Our examples and exercises are all such that this large sample 
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approximation is adequate. If you wish to use the Kruskal—Wallis analysis for smaller 
data sets, where this large-sample approximation is not adequate, refer to Hollander 
and Wolfe (1999) to obtain the appropriate critical values. 

We summarize the large sample Kruskal—Wallis procedure as follows. 


Kruskal-Wallis Test Based on H for Comparing k Population Distributions 


Null hypothesis: Ho: The k population distributions are identical. 


Alternative hypothesis: H,: At least two of the population distributions 
differ in location. 


k 
Test statistic: Н = (12/[n (n + ВУ OR? /n; — 3(n + 1), where 


п; = number of measurements in tye eels from population i, 

К; = rank sum for sample і, where the rank of each measurement is com- 
puted according to its relative size in the overall set of = nj 4-n»4-- - --- ni 
observations formed by combining the data from all К samples. 

Rejection region: Reject Ho if H > x2 with (k — 1) df. 

Assumptions: The К samples are randomly and independently drawn. There 
are five or more measurements in each sample. 


EXAMPLE 15.7 Aquality control engineer has selected independent samples from the output of three 
assembly lines in an electronics plant. For each line, the output of ten randomly 
selected hours of production was examined for defects. Do the data in Table 15.6 
provide evidence that the probability distributions of the number of defects per hour 
of output differ in location for at least two of the lines? Use o = .05. Also give the 
p-value associated with the test. 


Solution In this case, nı = 10 = n» = пз and n = 30. Thus, 
12 ea (210.5)? " (134.5)? 


| 3(31) = 6.097. 


~ 3031) | 10 10 10 
Table 15.6 Data for Example 15.7 
Line 1 Line 2 Line 3 
Defects Rank Defects Rank Defects Rank 

6 5 34 25 13 9.5 
38 27 28 19 35 26 

3 2 42 30 19 15 
17 13 13 9.5 4 3 
11 8 40 29 29 20 
30 21 31 22 0 1 
15 11 9 7 7 6 
16 12 32 23 33 24 
25 17 39 28 18 14 

5 4 27 18 24 16 


768 Chapter 15 


Nonparametric Statistics 


Because all the n; values are greater than or equal to 5, we may use the approximation 
for the null distribution of H and reject the null hypothesis of equal locations if 
Н > x2 based on k — 1 = 2 df. We consult Table 6, Appendix 3, to determine that 
ou = 5.99147. Thus, we reject the null hypothesis at the о = .05 level and conclude 
that at least one of the three lines tends to produce a greater number of defects than 
the others. 

According to Table 6, Appendix 3, the value of Н = 6.097 leads to rejection of 
the null hypothesis if a = .05 but not if о = .025. Thus, .025 < p-value < .05. 
The applet Chi-Square Probability and Quantiles can be used to establish that the 
approximate p-value = P(x? > 6.097) = .0474. Шш 


15.29 


15.30 


It can be shown that, if we wish to compare only k = 2 populations, the Kruskal- 
Wallis test is equivalent to the Wilcoxon rank-sum two-sided test presented in Sec- 
tion 15.5. If data are obtained from a one-way layout involving k > 2 populations but 
we wish to compare a particular pair of populations, the Wilcoxon rank-sum test (or 
the equivalent Mann—Whitney U test of Section 15.6) can be used for this purpose. 
Notice that the analysis based on the Kruskal—Wallis H statistic does not require 
knowledge of the actual values of the observations. We need only know the ranks 
of the observations to complete the analysis. Exercise 15.32 illustrates the use of the 
Kruskal—Wallis analysis for such a case. 


Exercises 


The table that follows contains data on the leaf length for plants of the same species at each of 
four swampy underdeveloped sites. At each site, six plants were randomly selected. For each 
plant, ten leaves were randomly selected, and the mean of the ten measurements (in centimeters) 
was recorded for each plant from each site. Use the Kruskal-Wallis H test to determine whether 
there is sufficient evidence to claim that the distribution of mean leaf lengths differ in location 
for at least two of the sites. Use a = .05. Bound or find the approximate p-value. 


Site Mean Leaf Length (cm) 
1 5.7 6.3 6.1 6.0 5.8 6.2 
2 6.2 9:3 5.7 6.0 5.2 5.5 
3 5.4 5.0 6.0 5.6 4.0 3:2. 
4 3.7 3.2 3.9 4.0 3.5 3.6 


A company plans to promote a new product by using one of three advertising campaigns. To 
investigate the extent of product recognition resulting from the campaigns, 15 market areas 
were selected, and 5 were randomly assigned to each campaign. At the end of the campaigns, 
random samples of 400 adults were selected in each area, and the proportions who indicated 
familiarity with the product appear in the following table. 
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Campaign 
1 2 3 
33 .28 21 
29 Al 30 
21 34 .26 
32 39 33 
25 27 31 
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a What type of experimental design was used? 


b Is there sufficient evidence to indicate a difference in locations of the distributions of product 
recognition scores for the three campaigns? Bound or give the approximate p-value. 


c Campaigns 2 and 3 were, respectively, the most and least expensive. Is there sufficient 
evidence to indicate that campaign 2 is more successful than campaign 3? Test using the 
Mann-Whitney U procedure. Give the associated p-value. 


Three different brands of magnetron tubes (the key components in microwave ovens) were 
subjected to stressful testing, and the number of hours each operated without repair was recorded 
(see the accompanying table). Although these times do not represent typical life lengths, they 
do indicate how well the tubes can withstand extreme stress. 


Brand A BrandB Brand C 


36 


49 
33 
60 

2 
55 


71 
31 
140 
59 
42 


a Use the F test for a one-way layout (Chapter 13) to test the hypothesis that the mean length 
of life under stress is the same for the three brands. Use о = .05. What assumptions аге 
necessary for the validity of this procedure? Is there any reason to doubt these assumptions? 


b Use the Kruskal-Wallis test to determine whether evidence exists to conclude that the 
brands of magnetron tubes tend to differ in length of life under stress. Test using o = .05. 


An experiment was conducted to compare the length of time it takes a person to recover from 
each of the three types of influenza— Victoria A, Texas, and Russian. Twenty-one human 
subjects were selected at random from a group of volunteers and divided into three groups of 7 
each. Each group was randomly assigned a strain of the virus and the influenza was induced in 
the subjects. АП of the subjects were then cared for under identical conditions, and the recovery 
time (in days) was recorded. The ranks of the results appear in the following table. 


Victoria А Texas 


Russian 


20 
6.5 

21 

16.5 

12 

18.5 
9 


14.5 
16.5 
4.5 
2.2 
14.5 
12 
18.5 


770 Chapter 15 


15.33 


15.34 


15.35 


Nonparametric Statistics 


a Dothe data provide sufficient evidence to indicate that the recovery times for one (or more) 
type(s) of influenza tend(s) to be longer than for the other types? Give the associated p-value. 


b Do the data provide sufficient evidence to indicate a difference in locations of the distribu- 
tions of recovery times for the Victoria A and Russian types? Give the associated p-value. 


The EPA wants to determine whether temperature changes in the ocean’s water caused by a 
nuclear power plant will have a significant effect on the animal life in the region. Recently 
hatched specimens of a certain species of fish are randomly divided into four groups. The 
groups are placed in separate simulated ocean environments that are identical in every way 
except for water temperature. Six months later, the specimens are weighed. The results (in 
ounces) are given in the accompanying table. Do the data provide sufficient evidence to indicate 
that one (or more) of the temperatures tend(s) to produce larger weight increases than the other 
temperatures? Test using o = .10. 


Weights of Specimens 
38°Е 42°Е 46°Е 50°Е 
22 15 14 17 
24 21 28 18 
16 26 21 13 
18 16 19 20 
19 25 24 21 
17 23 


Weevils cause millions of dollars worth of damage each year to cotton crops. Three chemicals 
designed to control weevil populations are applied, one to each of three cotton fields. After 
3 months, ten plots of equal size are randomly selected within each field and the percentage 
of cotton plants with weevil damage is recorded for each. Do the data in the accompanying 
table provide sufficient evidence to indicate a difference in location among the distributions of 
damage rates corresponding to the three treatments? Give bounds for the associated p-value. 


Chemical А | Chemical В Chemical С 


10.8 22.3 9.8 
15.6 19.5 12.3 
19.2 18.6 16.2 
17.9 24.3 14.1 
18.3 19.9 15.3 

9.8 20.4 10.8 
16.7 23.6 12.2 
19.0 21.2 17.3 
20.3 19.8 15.1 
19.4 22.6 11.3 


The Kruskal-Wallis statistic is 


i. - — Из. 
Н = —__ i К; — . 
ap 2 ) 


i=l 
Perform the indicated squaring of each term in the sum and add the resulting values to show that 


Bo PR? 
n(n + 1) х п ue 


[Hint: Recall that R; — R;/n; and that ya Rj = sum of the first n integers = n(n + 1)/2.] 
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Assuming no ties, obtain the exact null distribution of the Kruskal-Wallis H statistic for the 
case k = 3, п = n; = пз = 2. [Because the sample sizes аге all equal, if ranks 1 and 2 
are assigned to treatment 1, ranks 3 and 4 are assigned to treatment 2, and ranks 5 and 6 are 
assigned to treatment 3, the value of H is exactly the same as if ranks 3 and 4 are assigned 
to treatment 1, ranks 5 and 6 are assigned to treatment 2, and ranks 1 and 2 are assigned to 
treatment 3. That is, for any particular set of ranks, we may interchange the roles of the К 
populations and obtain the same values of the H statistic. Thus, the number of cases that we 
must consider can be reduced by a factor of 1/k!. Consequently, H must be evaluated only for 
(6!/[2! - 2! - 21D/3! = 15 distinct arrangements of ranks.] 


The Friedman Test for Randomized 
Block Designs 


In Section 12.4, we discussed the merits of a randomized block design for an experi- 
ment to compare the performance of several treatments. We assume that Б blocks are 
used in the experiment, which is designed to compare the locations of the distribu- 
tions of the responses corresponding to each of k treatments. The ANOVA, discussed 
in Section 13.9, was based on the assumptions that the observations in each block- 
treatment combination were normally distributed with equal variances. As in the case 
of the one-way layout, SST was the key quantity in the analysis. 

The Friedman test, developed by Nobel Prize-winning economist Milton Fried- 
man (1937), is designed to test the null hypothesis that the probability distributions of 
the k treatments are identical versus the alternative that at least two of the distributions 
differ in location. The test is based on a statistic that is a rank analogue of SST for the 
randomized block design (see Section 13.9) and is computed in the following man- 
ner. After the data from a randomized block design are obtained, within each block 
the observed values of the responses to each of the k treatments are ranked from 1 (the 
smallest in the block) to k (the largest in the block). If two or more observations in the 
same block are tied for the same rank, then the average of the ranks that would have 
been assigned to these observations is assigned to each member of ће tied group. How- 
ever, ties need to be dealt with in this manner only if they occur within the same block. 

Let R; denote the sum of the ranks of the observations corresponding to treatment 
i and let R; — R;/b denote the corresponding average of the ranks (recall that in 
a randomized block design, each treatment is applied exactly once in each block, 
resulting in a total of b observations per treatment and hence in a total of bk total 
observations). Because ranks of 1 to k are assigned within each block, the sum of the 
ranks assigned in each block is 1+ 2 + --- + К = k(k + 1)/2. Thus, the sum of all 
the ranks assigned in the analysis is bk(k + 1)/2. If R denotes the overall average of 
the ranks of all the bk observations, it follows that R = (k 4- 1) /2. Consider the rank 
analog of SST for a randomized block design given by 


k 
W=b DR = RYP. 
i=l 


If the null hypothesis is true and the probability distributions of the treatment responses 
do not differ in location, we expect the R;-values to be approximately equal and the 
resulting value for W to be small. If the alternative hypothesis were true, we would 
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expect this to lead to differences among the R;-values and corresponding large values 
of W. Instead of W, Friedman considered the statistic F, = 12W/[k(k + 1)], which 
may be rewritten (see Exercise 15.44) as 


DE aa 

"UPC À R? —3b(k + 1). 

As previously noted, the null hypothesis of equal locations is rejected in favor of the 
alternative that the treatment distributions differ in location if the value of F, is large. 
That is, the corresponding o-level test rejects the null hypothesis in favor of the alter- 
native if F, > ў, (0), where f; (0) is such that, when Ho is true, P[F, > f-(@)] = a. 

Ifthere are no ties among the observations within the blocks, the null distribution of 
F, can (tediously) be found by using the methods of Chapter 2. For any values of b and 
К, the distribution of Р, is found as follows. If the null hypothesis is true, then each of 
the k! permutations of the ranks 1, 2, . . . , k within each block is equally likely. Further, 
because we assume that the observations in different blocks are mutually independent, 
it follows that each of the (k!)’ possible combinations of the b sets of permutations 
for the within-block ranks are equally likely when Не is true. Consequently, we can 
evaluate the value of F, for each possible case and thereby give the null distribution 
of F, (see Exercise 15.45). Selected values for f; (о) for various choices of k and b 
are given in Table A.22 of Hollander and Wolfe (1999). Like the other nonparametric 
procedures discussed in this chapter, the real advantage of this procedure is that 
it can be used regardless of the form of the actual distributions of the populations 
corresponding to the treatments. 

As with the Kruskal-Wallis statistic, the null distribution of the Friedman F, 
statistic can be approximated by a X? distribution with k — 1 df as long as b is 
“large.” Empirical evidence indicates that the approximation is adequate if either b 
(the number of blocks) or k (the number of treatments) exceeds 5. Again, our examples 
and exercises deal with situations where this large-sample approximation is adequate. 
If you need to implement a Friedman analysis for small samples, refer to Hollander 
and Wolfe (1999) to obtain appropriate critical values. 


F, 


Friedman Test Based on F, for a Randomized Block Design 


Null hypothesis: Ho: The probability distributions for the k treatments are 
identical. 

Alternative hypothesis: H,: At least two of the distributions differ in loca- 
tion. 


k 
Test statistic: Р, = {12/[bk(k + DIY ke — 3b(k + 1), where 


b = number of blocks, = 


k = number of treatments, 

К; = sum of the ranks for the ith treatment, where the rank of each mea- 
surement is computed relative to its size within its own block. 

Rejection region: F, > x2 with (k — 1) df. 

Assumptions: The treatments are randomly assigned to experimental units 
within blocks. Either the number of blocks (5) or the number of treatments 
(k) exceeds 5. 
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EXAMPLE 15.8 


Solution 


An experiment to compare completion times for three technical tasks was performed 
in the following manner. Because completion times may vary considerably from 
person to person, each of the six technicians was asked to perform all three tasks. 
The tasks were presented to each technician in a random order with suitable time lags 
between the tasks. Do the data in Table 15.7 present sufficient evidence to indicate 
that the distributions of completion times for the three tasks differ in location? Use 
a = .05. Give bounds for the associated p-value. 


Table 15.7 Completion times for three tasks 


Technician Task A Rank Task B Rank Task C Rank 
1 1.21 1 1.56 3 1.48 2 
2 1.63 1:5 2.01 3 1.63 1.5 
3 1.42 1 1.70 2 2.06 3 
4 1.16 1 1.27 2.5 1.27 2.5 
5 2.43 2 2.64 3 1.98 1 
6 1.94 1 2.81 3 2.44 2 
К = 7.5 Ry = 16.5 В; = 12 


The experiment was гип according to а randomized block design with technicians 
playing the role of blocks. In this case, k = 3 treatments are compared using b = 6 
blocks. Because the number of blocks exceeds 5, we may use the Friedman analysis 
and compare the value of F, to x2, based on k — 1 = 2 df. Consulting Table 6, 
Appendix 3, we find X55 — 5.99147. For the data given in Table 15.7, 


[07.5)2 + (16.5)? + (12)2] — 3(6)(4) = 6.75. 


F, = 
6(3)(4) 


Because F. = 6.75, which exceeds 5.99147, we conclude at the a = .05 level that 
the completion times of at least two of the three tasks possess probability distributions 
that differ in location. 

Because F, = 6.75 is the observed value of a statistic that has approximately a 
X? distribution with 2 df, it follows that (approximately) .025 < p-value < .05. The 
applet Chi-Square Probability and Quantiles applies to establish that the approximate 
p-value = P(x? > 6.75) = .0342. 0 


In some situations, it might be easy to rank the responses within each block 
but much more difficult to assign a meaningful numerical value to the response to 
each treatment in the blocks. An example illustrating this scenario is provided in 
Exercise 15.42. 

It can be seen (see Exercise 15.43) that, if we wish to compare only k = 2 
treatments using a randomized block design (so that the blocks are of size 2), the 
Friedman statistic is the square of the standardized sign statistic (that is, the square 
of the Z statistic given in Section 15.3). Thus, for k = 2, the Friedman analysis is 
equivalent to a two-tailed sign test. 
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Exercises 


In a study of palatability of antibiotics for children, Doreen Matsui and her colleagues used a 
voluntary sample of healthy children to assess their reactions to the taste of four antibiotics. The 
children's responses were measured on a 10-centimeter visual analog scale that incorporated 
the use of faces, from sad (low score) to happy (high score). The minimum and maximum 
scores were, respectively, 0 and 10. The data in the following table (simulated from the results 
given in Matsui's report) were obtained when each of five children were asked to rate the taste 
of all four antibiotics. 


Antibiotic 
Child I П IH IV 


48 22 68 62 
81 92 66 9.6 
50 26 36 65 
79 94 53 8.5 
39 74 21 20 


л UO P2 н 


a Is there sufficient evidence to conclude that there are differences in the perceived taste of 
the different antibiotics? Bound or find the approximate p-value. 


What would you conclude at the о = .05 level of significance. 


Why did Matsui have each child rank all four antibiotics instead of using 20 different chil- 
dren, randomly selecting 5 to receive only antibiotic I, another 5 to receive only antibiotic 
IL, 5 of those remaining to receive only antibiotic Ш, with the 5 remaining receiving only 
antibiotic IV? 


An experiment was performed to assess whether heavy metals accumulate in plants grown in 
soils amended with sludge and if there is an associated accumulation of those metals in aphids 
feeding on those plants. The data in the accompanying table are cadmium concentrations (in 
micrograms/kilogram) in plants grown under six different rates of sludge application for three 
different harvests. The application rates are the treatments, and the three harvests represent 
blocks of time. 


Harvest 
Rate 1 2 3 
Control 162.1 153.7 200.4 
1 199.8 199.6 278.2 
2 220.0 210.7 294.8 
3 194.4 179.0 341.1 
4 204.3 203.7 330.2 
5 218.9 236.1 344.2 


5. Source: D. Matsui et al., "Assessment of the Palatability of 6—Lactamase-Resistant Antibiotics in 
Children;" Archives of Pediatric Adolescent Medicine 151 (1997): 559—601. 

6. Source: О. Merrington, L. Winder, and I. Green, “The Uptake of Cadmium and Zinc by the Birdcherry 
Oat Aphid Rhopalosiphum Padi (Homoptera:Aphididae) Feeding on Wheat Grown on Sewage Sludge 
Amended Agricultural Soil," Environmental Pollution 96(1) (1997): 111—114. 
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a Is there sufficient evidence to indicate a difference in cadmium accumulation in plants 
grown in plots subjected to different levels of sludge application? Bound or determine the 
approximate p-value. 


b What would you conclude at the о = .01 significance level? 


Corrosion of metals is a problem in many mechanical devices. Three sealants used to help 
retard the corrosion of metals were tested to see whether there were any differences among 
them. Samples from ten different ingots of the same metal composition were treated with each 
of the three sealants, and the amount of corrosion was measured after exposure to the same 
environmental conditions for 1 month. The data are given in the accompanying table. Is there 
any evidence of a difference in the abilities of the sealants to prevent corrosion? Test using 
a = .05. 


Sealant 

Ingot I II Ш 
1 4.6 4.2 4.9 
2 7.2 6.4 7.0 
3 3.4 3.5 3.4 
4 6.2 5:3 5.9 
5 8.4 6.8 7.8 
6 5.6 4.8 5.7 
7 3.7 3.7 4.1 
8 6.1 6.2 6.4 
9 49 4.1 4.2 
10 25:2 5.0 5.1 


A serious drought-related problem for farmers is the spread of aflatoxin, a highly toxic substance 
caused by mold, which contaminates field corn. In higher levels of contamination, aflatoxin is 
hazardous to animal and possibly human health. (Officials of the FDA have set a maximum 
limit of 20 parts per billion aflatoxin as safe for interstate marketing.) Three sprays, A, B, and 
C, have been developed to control aflatoxin in field corn. To determine whether differences 
exist among the sprays, ten ears of corn are randomly chosen from a contaminated corn field, 
and each is divided into three pieces of equal size. The sprays are then randomly assigned to the 
pieces for each ear of corn, thus setting up a randomized block design. The accompanying table 
gives the amount (in parts per billion) of aflatoxin present in the corn samples after spraying. 
Use the Friedman test based on F, to determine whether there are differences among the sprays 
for control of aflatoxin. Give approximate bounds for the p-value. 


Spray Spray 
Ear A B C Ear A B C 
1 21 23 15 6 5 12 6 
2 29 30 21 7 18 18 12 
3 16 19 18 8 26 32 21 
4 20 19 18 9 17 20 9 
5 13 10 14 10 4 10 2 


A study was performed to compare the preferences of eight “expert listeners” regarding 15 
models (with approximately equal list prices) of a particular component in a stereo system. 
Every effort was made to ensure that differences perceived by the listeners were due to the 
component of interest and no other cause (all other components in the system were identical, 
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the same type of music was used, the music was played in the same room, etc.). Thus, the 
results of the listening test reflect the audio preferences of the judges and not judgments re- 
garding quality, reliability, or other variables. Further, the results pertain only to the models 
of the components used in the study and not to any other models that may be offered by the 
various manufacturers. The data in the accompanying table give the results of the listening 
tests. The models are depicted simply as models A, B, ..., О. Under each column heading 
are the numbers of judges who ranked each brand of component from 1 (lowest rank) to 15 
(highest rank). 


Rank 
Model 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
А 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 
B 0 0 0 1 0 2 1 1 1 0 0 0 0 2 0 
C 0 1 1 1 4 0 0 1 0 0 0 0 0 0 0 
D 1 0 | T б-т © 0 1 0 1 0 1 1 0 
Е 0 2 1 3 0 2 0 0 0 0 0 0 0 0 0 
Е 0 0 0 0 0 0 оо 1 2 2 3 0 0 0 
G 0 0 0 0 0 0 0 0 0 1 0 2 4 1 0 
H |] 2 1 1 0 0 2 1 0 0 0 0 0 0 0 
I 3 2 1 0 0 0 0 0 0 1 0 1 0 0 0 
J оо 1 0 2 0 2 0 0 0 2 0 1 0 0 
K 0 0 0 0 0 0 | 1 0 2 1 1 1 1 0 
L 0 0 0 0 0 0 | 1 4 0 1 0 1 0 0 
M 1 ї 2. 4 1 2 0 0 0 0 0 0 0 0 0 
N 2 0 0 0 0 0 1 1 0 0 0 1 0 3 0 
О 0 0 0 0 | 1 0 2 1| 2 1 0 0 0 0 


Use the Friedman procedure to test whether the distributions of the preference scores differ 
in location for the 15 component models. Give bounds for the attained significance level. 
What would you conclude at the œ = .01 level of significance? [Hint: The sum of the ranks 
associated with the component of model О is5+6+8+8+9+ 10+ 10+ 11 = 67; other 
rank sums can be computed in an analogous manner. | 


If, prior to running the experiment, we desired to compare components of models G and H, 
this comparison could be made by using the sign test presented in Section 15.3. Using the 
information just given, we can determine that model G was preferred to model H by all eight 
judges. Explain why. Give the attained significance level if the sign test is used to compare 
components of models G and H. 


Explain why there is not enough information given to use the sign test in a comparison of 
only models H and M. 


An experiment is conducted to investigate the toxic effect of three chemicals, A, B, and C, 
on the skin of rats. Three adjacent j-inch squares are marked on the backs of eight rats, and 
each of the three chemicals is applied to each rat. The squares of skin on each rat are ranked 
according to severity of irritation (1 — least severe, 3 — most severe). The resulting data are 
given in the accompanying table. Is there sufficient evidence to support the research hypothesis 
that the probability distributions of skin irritation scores corresponding to the three chemicals 
differ in location? Use о = .01. (Note: Ranking the severity of reactions to the chemicals for 
each rat is probably much more meaningful than assigning an arbitrary “irritation score" to 
each portion of skin.) 
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Chemical 
Rat A В С 
1 3. 2 oil 
2 3 2 1] 
3 2 3 1 
4 1 3 2 
5 Lb. 2 -3 
6 pL з 2 
7 2 3 1 
8 2 3 


Consider the Friedman statistic F, when k = 2 and b = (number of blocks) = n. Then, 
F, = (2/n) (R? + R3) — 9n. Let M be the number of blocks (pairs) in which treatment one has 
rank 1. If there are no ties, then treatment 1 has rank 2 in the remaining n — M pairs. Thus, №. = 
M+2(n—M) = 2n — M. Analogously, К = n+ M. Substitute these values into the preceding 
expression for F, and show that the resulting value is 4(M — .5n)?/n. Compare this result with 
the square of the Z statistic in Section 15.3. This procedure demonstrates that F, = Z?. 


Consider the Friedman statistic 


Op xES 1 
F; [— M. oorr eri Ri— R s 
ip 2! | 


Square each term in the sum, and show that an alternative form of F, is 


_ 2 
~ bk(k +1) 


r 


k 
XOR- 3b(k + 1). 
і=1 


[Hint: Recall that R; = R; /b, R-(k41) /2 and note that p» , Ri = sum of all of the ranks = 
bk(k + 1)/2]. 


If there are no ties апар = 2, k = 3, derive the exact null distribution of F,. 


The Runs Test: A Test for Randomness 


Consider a production process in which manufactured items emerge in sequence and 
each is classified as either defective (D) or nondefective (N). We have studied how 
we might compare the fraction of defectives for two equal time intervals by using a 
Z test (Chapter 10) and extended this to test the hypothesis of constant p over two or 
more time intervals by using the x? test of Chapter 14. The purposes of these tests 
were to detect a change or trend in the fraction of defectives, p. Evidence to indicate 
an increasing fraction of defectives might indicate the need for a process study to 
locate the source of difficulty. A decreasing value might suggest that a process quality 
control program was having a beneficial effect in reducing the fraction of defectives. 

Trends in the fraction of defective items (or other quality measures) are not the 
only indication of lack of process control. A process might be causing periodic runs of 
defective items even though the average fraction of defective items remains constant, 
for all practical purposes, over long periods of time. For example, spotlight bulbs 
are manufactured on a rotating machine with a fixed number of positions for bulbs. 
A bulb is placed on the machine at a given position, the air is removed, gases are 
pumped into the bulb, and the glass base is flame-sealed. If a machine contains 20 
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positions and several adjacent positions are faulty (perhaps due to too much heat used 
in the sealing process), surges of defective bulbs will emerge from the process in a 
periodic manner. Tests that compare the process fraction of defective items produced 
during equal intervals of time will not detect this periodic difficulty in the process. 
This periodicity, indicated by runs of defectives, indicates nonrandomness in the 
occurrence of defective items over time and can be detected by a test for randomness. 
The statistical test we present, known as the runs test, is discussed in detail by Wald 
and Wolfowitz (1940). Other practical applications of the runs test will follow. 

As the name implies, the runs test is used to study a sequence of events where 
each element in the sequence can assume one of two outcomes, success (S) or failure 
(F). If we think of the sequence of items emerging from a manufacturing process as 
defective (F) or nondefective (S), the observation of twenty items might yield 


S S S $ 5 F F $ 5 S 
ЕЕЕ SSSSSS S 


We notice the groupings of defectives and nondefectives and wonder whether this 
grouping implies nonrandomness and, consequently, lack of process control. 


A run is a maximal subsequence of like elements. 


For example, the first five successes constitute a maximal subsequence of 5 like 
elements (that is, it includes the maximum number of like elements before encoun- 
tering an F). (The first 4 elements form a subsequence of like elements, but it is 
not maximal because the 5th element also could be included.) Consequently, the 20 
elements are arranged in five runs, the first containing five S’s, the second containing 
two F’s, and so on. 

A very small or very large number of runs in a sequence indicates nonrandomness. 
Therefore, let R (the number of runs in a sequence) be the test statistic and let the 
rejection region be А < kı and А > К», as indicated in Figure 15.3. We must then find 
the probability distribution for R, P(R = r), to calculate а and to locate a suitable 
rejection region for the test. 

Suppose that the complete sequence contains nı S elements and пә F elements, re- 
sulting in Y; runs of S’s and Y; runs of F's, where (Y; + Y?) = R. Then, fora given Yi, 
Y» can equal Yi, (Y; — 1), or (Yı + 1). Let m denote the maximum possible number of 
runs. Notice that m = 2n, if n; = nz, and that m = (2n; +1) ifn; < nz. We will sup- 
pose that every distinguishable arrangement of the (nı 4- пә) elements in the sequence 
constitutes a simple event for the experiment and that the sample points are equiprob- 
able. It then remains for us to count the number of sample points that imply R runs. 

The total number of distinguishable arrangements of n, S elements and n» F 


elements is 
ny +m 
ПІ 2 


| | Ы |__| 
I I р 1 =] 


| 
I 
2 3 4 Kise denne КГ ГУ КГС kD т 


Reject Number of Runs R Reject 


FIGURE 15.4 
The distribution of 
n; S elements in у 
cells (none empty) 
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IS|SSSS|SS...|S S|SSS|S| 


and therefore the probability per sample point is 


1 
ni d noy. 
ni 


The number of ways of achieving y; S runs is equal to the number of identifiable 
arrangements of n, indistinguishable elements in y; cells, none of which is empty, 
as represented in Figure 15.4. This is equal to the number of ways of distributing the 
(ут = 1) inner bars in the (nı — 1) spaces between the S elements (the outer two bars 
remain fixed). Consequently, it is equal to the number of ways of selecting (y; — 1) 
spaces (for the bars) out of the (nı — 1) spaces available, or 


Wn 
y»-1/ 


The number of ways of observing y, S runs and y2 F runs, obtained by applying 


the mn rule, is 
( — ) i — ) 
у= 1/ 40-14. 


This gives the number of sample points in the event “yı runs of 5° and у runs of 
F’s.” Then, multiplying this number by the probability per sample point, we obtain 
the probability of exactly y; runs of S's and у» runs of F's: 


Саа 
i=l A= 
pn. y») —— (^ ad $ 


nı 


Then, P(R = r) equals the sum of p(yi, у) over all values of yı and уз such that 
Qi T y) =. 

To illustrate the use of the formula, the event А = 4 could occur when y; = 2 and 
y2 = 2 with either the S ог F elements commencing the sequences. Consequently, 


P(R = 4) = 2Р(Ү, = 2, Y, = 2). 


On the other hand, А = 5 could occur when уу = 2 and уз = 3 or when y; = 3 and 
y» = 2, and these occurrences are mutually exclusive. Then, 


P(R = 5) = P(Y, = 3, Yo 22) + P(Y 22, Yo = 3). 


Suppose that a sequence consists of n, = 5 S elements and ny = 3 F elements. 
Calculate the probability of observing R = 3 runs. Also, calculate P(R < 3). 
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Three runs could occur when y, = 2 and y; = 1, or when y; = 1 and у = 2. Then, 


Р(Е = 3) = Р(Ү, = 2,Ү» = 1) + PO = 1,Ү»› =2) 


WO) OG) _+ 2 us 
(;) 


8 756 56 
5 


Next, we require that P(R < 3) = P(R = 2) + P(R = 3). Accordingly, 


QU) _ 2 _ 


PBR DIRE DEO a 
; 


Thus, the probability of 3 or fewer runs is .107 4- .036 — .143. B 


EXAMPLE 15.10 


Solution 


The values of P(R < a) are given in Table 10, Appendix 3, for all combinations 
of n; and n2, where n, and n» are less than or equal to 10. These can be used to locate 
the rejection regions of one- or two-tailed tests. We illustrate with an example. 


A true—false examination was constructed with the answers running in the following 
sequence: 


T F FT FT FTT FT FFTFTFIT T F 


Does this sequence indicate a departure from randomness in the arrangement of T 
and F answers? 


The sequence contains пу = 10 T and n; = 10 Е answers, with y = 16 runs. 
Nonrandomness can be indicated by either an unusually small or an unusually large 
number of runs; consequently, we will be using a two-tailed test. 

Suppose that we wish to use o approximately equal to .05 with .025 or less in each 
tail of the rejection region. Then, from Table 10, Appendix 3, with ny = n» = 10, 
we see that P(R < 6) = .019 and P(R < 15) = .981. Then, P(R > 16) = 
1] — P(R x 15) = .019, and we would reject the hypothesis of randomness at the 
a = .038 significance level if R < 6 or R > 16. Because Ё = 16 for the observed 
data, we conclude that evidence exists to indicate nonrandomness in the professor's 
arrangement of answers. The attempt to mix the answers was overdone. L1 


A second application of the runs test is in detecting nonrandomness of a sequence 
of quantitative measurements over time. These sequences, known as fime series, 
occur in many fields. For example, the measurement of a quality characteristic of an 
industrial product, blood pressure of a person, and the price of a stock on the stock 
market all vary over time. Departures from randomness in a series, caused either by 
trends or periodicities, can be detected by examining the deviations of the time series 
measurements from their average. Negative and positive deviations could be denoted 
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by S and F, respectively, and we could then test this time sequence of deviations for 
nonrandomness. We illustrate with an example. 


EXAMPLE 15.11 


FIGURE 15.5 
Paper brightness 
versus time 


Solution 


Paper is produced in a continuous process. Suppose that a brightness measurement 
Y is made on the paper once every hour and that the results appear as shown in 
Figure 15.5. 

The average y for the 15 sample measurements appears as shown. Notice the 
deviations about y. Do these data indicate a lack of randomness and thereby suggest 
periodicity and lack of control in the process? 


Brightness 
y 


<I 


x 


Time (hours) 


The sequence of negative (S) and positive (F) deviations as indicated in Figure 15.5 
is 


$$ $ S F F $ F F § F $ 5 S S 


Then, n, = 10, n; = 5, and R = 7. Consulting Table 10 in Appendix 3, we find 
P(R < 7) = .455. This value of R is not improbable, assuming the hypothesis 
of randomness to be true. Consequently, there is not sufficient evidence to indicate 
nonrandomness in the sequence of brightness measurements. [5] 


The runs test can also be used to compare two population frequency distributions 
for a two-sample unpaired experiment. Thus, it provides an alternative to the Mann- 
Whitney U test (Section 15.6). If the measurements for the two samples are arranged 
in order of magnitude, they form a sequence. The measurements for samples 1 and 2 
can be denoted as S and F, respectively, and once again we are concerned with a test 
for randomness. If all measurements for sample 1 are smaller than those for sample 
2, the sequence will result in SSSS... SFFF...F,or R = 2 runs. A small value 
of R provides evidence of a difference in population frequency distributions, and the 
rejection region chosen is R < a. This rejection region implies a one-tailed statistical 
test. An illustration of the application of the runs test to compare two population 
frequency distributions is left as an exercise. 
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As in the case of the other nonparametric test statistics studied in earlier sections 
of this chapter, the probability distribution for R tends toward normality as n; and n» 
become large. The approximation is good when n, and п» are both greater than 10. 
Consequently, we may use the Z statistic as a large-sample test statistic, where 


and 


nin Qnin» ni — пә) 


VOR) = = 
(n1 + n2)* (n4 + n5 — 1) 


are the expected value and variance of R, respectively. The rejection region for a 
two-tailed test, with о = .05, is |z| > 1.96. If œ is the desired probability of a type 
I error, for an upper-tail test, we reject the null hypothesis if z > z, (for a lower-tail 
test, we reject Ho if z < —Zy). 


Exercises 


Consider a runs test based on n; = n» = 5 elements. Assuming Но to be true, use Table 10, 
Appendix 3, to find the following: 


a Р(К = 2). 
b P(R <3). 
c P(R <4). 


A union supervisor claims that applicants for jobs are selected without regard to race. The 
hiring records of the local—one that contains all male members—gave the following sequence 
of White (W) and Black (B) hirings: 


W W Ww W BW WwW WwW B BW В В 
Do these data suggest a nonrandom racial selection in the hiring of the union's members? 


The conditions (D for diseased, 5 for sound) of the individual trees in а row of ten poplars 
were found to be, from left to right: 


$$ DDS DDODS $ 


Is there sufficient evidence to indicate nonrandomness in the sequence and therefore the pos- 
sibility of contagion? 


Items emerging from a continuous production process were classified as defective (D) or 
nondefective (N). A sequence of items observed over time was as follows: 


DN N N N N N DDN N N N N N D D 
DN N N N N DWN N N D DN N NDD. 

a Compute the probability that R < 11, where n, = 11 and n; = 23. 

b Dothese data suggest lack of randomness іп the occurrence of defectives and nondefectives? 


Use the large-sample approximation for the runs test. 
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A quality control chart has been maintained for a measurable characteristic of items taken from 
a conveyor belt at a fixed point in a production line. The measurements obtained today, in order 
of time, are as follows: 


68.2 71.6 69.3 71.6 70.4 65.0 63.6 64.7 
65.3 64.2 67.6 68.6 66.8 68.9 66.8 70.1 


a Classify the measurements in this time series as above or below the sample mean and 
determine (using the runs test) whether consecutive observations suggest lack of stability 
in the production process. 

b Divide the time period into two equal parts and compare the means, using Student's f test. 
Do the data provide evidence of a shift in the mean level of the quality characteristics? 
Explain. 


Refer to Exercise 15.24. Use the runs test to analyze the data. Compare your answer here with 
your answer to Exercise 15.24. 


Refer to Exercise 15.25. If indeed the experimental batteries have a greater mean life, what 
would be the effect of this on the expected number of runs? Using the large-sample theory for 
the runs test, test (using о = .05) whether there is a difference in the distributions of battery 
life for the two populations. Give the approximate p-value. 


Rank Correlation Coefficient 


In the preceding sections, we used ranks to indicate the relative magnitude of observa- 
tions in nonparametric tests for comparison of treatments. We now employ the same 
technique in testing for a correlation between two ranked variables. Two common 
rank correlation coefficients are Spearman's statistic rs and Kendall’s т. We present 
the Spearman rs because its computation is analogous to that of the sample correla- 
tion coefficient r of Section 11.8. Kendall's rank correlation coefficient is discussed 
in detail in Kendall and Stuart (1979). 

Suppose that eight elementary-science teachers have been ranked by a judge 
according to their teaching ability, and all have taken a national teachers’ exami- 
nation. The data are given in Table 15.8. Do the data suggest agreement between 
the judge's ranking and the examination score? Alternatively, we might express this 
question by asking whether a correlation exists between the judge's ranking and the 
ranks of examination scores. 

The two variables of interest are rank and test score. The former is already in 
rank form, and the test scores may be ranked similarly, as shown in parentheses in 


Table 15.8 Data for science teachers 


Teacher  Judge's Rank Examination Score 


1 7 44 (1) 
2 4 72 (5) 
3 2 69 (3) 
4 6 70 (4) 
5 1 93 (8) 
6 3 82 (7) 
7 8 67 О) 
8 5 80 (6) 
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Table 15.8. The ranks for tied observations are obtained by averaging the ranks that 
the tied observations would occupy, as is done for the Mann—Whitney U statistic. 

Recall that the sample correlation coefficient (Section 11.8) for observations 
(х1, yi). Qo, Уо), ..., On, Yn) is given by 


Sxy __ tel 


y Sxx Syy n 1 п 2 п 1 п 2 
мтр (£=) D ( s) 


Let R(x;) denote the rank of x; among x1, хо,..., Xn and R(y;) denote the rank of 
y; among yi, y2, ..., Ул. The Spearman rank correlation coefficient, rs, is calculated 
by substituting the ranks as the paired measurements in the above formula. Thus, 


Yo RGDROD- - » Р Ro») 
i=] i=1 isl 


2 


n n 


n n 2 
УК? — : Р Res) DROP - Е Р Ro») 
і=1 i=l 


i=l і=1 


When there аге no ties in either the x observations or the у observations, this 
expression for rs algebraically reduces to a simpler expression: 


6а 


пи т where d; = R(x;) — Ку). 


If the number of ties is small in comparison with the number of data pairs, little 
error will result from using this shortcut formula. We leave proof of this simplification 
as an exercise (Exercise 15.78) and illustrate the use of the formula by an example. 


EXAMPLE 15.12 Calculate rs for the judge's ranking and examination score data from Table 15.8. 


Solution The differences and squares of differences between the two rankings are shown in 
Table 15.9. 
Substituting into the formula for rs, we obtain 


6yd 

E 6(144 

— = dno NE ma H 
n(n2 — 1) 8(64 — 1) 


rs = 
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Table 15.9 Data and calculations for Example 15.12 
Teacher Р(х) А(у) di d? 


i 


1 7 1 6 36 
2 4 5 —1 1 
3 2 3 —1 1 
4 6 4 2 4 
5 1 8 —7 49 
6 3 7 —4 16 
T 8 2 6 36 
8 5 6 —1 1 
Total 144 


The Spearman rank correlation coefficient may be employed as a test statistic to test 
the hypothesis of no association between two populations. We assume that the п pairs 
of observations (x;, y;) have been randomly selected, and the absence of any associ- 
ation between the populations therefore implies a random assignment of the n ranks 
within each sample. Each random assignment (for the two samples) represents a sam- 
ple point associated with the experiment, and a value of rs can be calculated for each. It 
is possible to calculate the probability that rs assumes a large absolute value due solely 
to chance and thereby suggests an association between populations when none exists. 

The rejection region for a two-tailed test includes values of rs near +1 and near 
—1. If the alternative is that the correlation between X and Y is negative, we reject Ho 
for values of rs near —1. Similarly, if the alternative is that the correlation between 
X and Y is positive, we reject Но for large positive values of rs. 

The critical values of rs are given in Table 11, Appendix 3. Recorded across the top 
of the table are values of o that you might wish to use for a one-tailed test of the null 
hypothesis of no association between X and Y. The number of rank pairs n appears 
at the left side of the table. The table entries give the critical value ro for a one-tailed 
test. Thus, P(rs > ro) = о. For example, suppose that you have п = 8 rank pairs 
and the research hypothesis is that the correlation between the ranks is positive. Then, 
you want to reject the null hypothesis of no association only for large positive values 
of rs, and you will use a one-tailed test. Referring to Table 11 and using the row 
corresponding to п = 8 and the column for о = .05, you read ro = .643. Therefore, 
you reject Ho for all values of rs greater than or equal to .643. 

If you wish to give the p-value associated with an observed value of r — .82, 
Table 11 gives that Ho would be rejected with о = .025 but not with a = .01. Thus, 
.01 < p-value < .025. 

The test is conducted in exactly the same manner if you wish to test the alternative 
hypothesis that the ranks are negatively correlated. The only difference is that you 
reject the null hypothesis if rg « —.643. That is, you just place a minus sign in front 
of the tabulated value of rọ to get the lower-tail critical value. Similarly, if r = —.82, 
then .01 « p-value < .025. 

To conduct a two-tailed test, you reject the null hypothesis if rs > ro or rs < —ro. 
The value of o for the test is double the value shown at the top of the table. For 
example, if п = 8 and you choose the .025 column, you reject Ho if rs > .738 or 
rg € —.738. The a-value for the test is 2(.025) = .05. 
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The p-value associated with a two-tailed test based on an observed value of r — .82 
is twice (because of the two tails) the one-tailed p-value; thatis,.02 < p-value < .05. 


EXAMPLE 15.13 


Solution 


Test the hypothesis of no association between populations for Example 15.12. Give 
bounds for the associated p-value. 


The critical value of rs for a one-tailed test with a = .05 and п = 8 is .643. 
Let us assume that a correlation between judge’s rank and the ranks of teachers’ 
examination scores could not possibly be positive. (Low rank means good teaching 
and should be associated with a high test score if the judge and the test both measure 
teaching ability.) The alternative hypothesis is that the population rank correlation 
coefficient ps is less than zero, so we are concerned with a one-tailed statistical test. 
Thus, o for the test is the tabulated value .05, and we reject the null hypothesis 
ifrs < —.643. 

The calculated value of the test statistic, rg = —.714, is less than the critical 
value for о = .05. Because Но is rejected for о = .05 but not for a = .025, the 
p-value associated with the test lies in the interval .025 < p-value < .05. Hence, the 
null hypothesis is rejected at the о = .05 level of significance. It appears that some 
agreement does exist between the judge’s rankings and the test scores. However, this 
agreement could exist when neither provides an adequate yardstick for measuring 
teaching ability. For example, the association could exist if both the judge and those 
who constructed the teachers’ examination possessed a completely erroneous but 
similar concept of the characteristics of good teaching. 


Spearman's Rank Correlation Test 


Null hypothesis: Ho: There is no association between the rank pairs. 
Alternative hypothesis: (1) H,: There is an association between the rank 
pairs (a two-tailed test), 

or (2) the correlation between the rank pairs is positive (or negative) (a 
one-tailed test). 

Test statistic: 


Е n aO (х) (у) = [Dy RGD] [DXX RO] 


rs = , 
| In Уу ЕР — [L RED] {коор - [Th кою] 


where R(x;) and R(y;) denote the rank of x; among x1, x», ..., x, and у; 
among yi, у», ..., Yn, respectively. 


Rejection region: For a two-tailed test, reject Ho if rg > ro or rs < —ro, 
where ro is given in Table 11, Appendix 3. Double the tabulated probability 
to obtain the o-value for the two-tailed test. For a one-tailed test, reject Ho 
if rg > ro (for an upper-tailed test) or rs < —ro (for a lower-tailed test). The 
a-value for a one-tailed test is the value shown in Table 11, Appendix 3. 
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An experiment was conducted to study the relationship between the ratings of tobacco-leaf 
graders and the moisture content of the corresponding tobacco leaves. Twelve leaves were 
rated by the grader on a scale of 1 to 10, and corresponding measurements on moisture content 
were made on the same leaves. The data are shown in the following table. Calculate rs. Do the 
data provide sufficient evidence to indicate an association between the grader's rating and the 
moisture content of the leaves? Explain. 


Leaf  Graders Rating Moisture Content 


1 9 22 
2 6 16 
3 7 zu 
4 7 14 
5 5 12 
6 8 19 
7 2 10 
8 6 12 
9 1 .05 
10 10 .20 
11 9 16 
12 3 09 


Manufacturers of perishable foods often use preservatives to retard spoilage. One concern is 
that too much preservative will change the flavor of the food. An experiment is conducted using 
portions of food products with varying amounts of preservative added. The length of time until 
the food begins to spoil and a taste rating are recorded for each portion of food. The taste rating 
is the average rating for three tasters, each of whom rated each food portion on a scale from 1 
(bad) to 5 (good). Twelve measurements are shown in the following table. Use a nonparametric 
test to determine whether spoilage times and taste ratings are correlated. Give the associated 
p-value and indicate the appropriate conclusion for an œ = .05 level test. 


Food Portion Days until Spoilage Taste Rating 


1 30 4.3 
2 47 3:6 
3 26 4.5 
4 94 2.8 
5 67 3.3 
6 83 2.7 
7 36 4.2 
8 77 3:9 
9 43 3.6 
10 109 2.2 
11 56 3.1 
12 70 2.9 


A large corporation selects graduates for employment by using both interviews and a psycho- 
logical achievement test. Interviews conducted at the home office of the company were far more 
expensive than the test, which could be conducted on campus. Consequently, the personnel 
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office was interested in determining whether the test scores were correlated with interview 
ratings and whether the tests could be substituted for interviews. The idea was not to eliminate 
interviews but to reduce their number. Ten prospects were ranked during interviews and then 
tested. The paired scores were as shown in the accompanying table. 


Subject Interview Rank Test Score 


- 
оо 


74 
81 
66 
83 
66 
94 
96 
70 
61 
86 


сл 


© © соз с хл RU r2 
NON & — с оо о 


- 


a Calculate the Spearman rank correlation coefficient rs. Rank 1 is assigned to the candidate 
judged to be the best. 

b Do the data present sufficient evidence to indicate that the correlation between interview 
rankings and test scores is less than zero? If such evidence does exist, can we say that tests 
could be used to reduce the number of interviews? 


A political scientist wished to examine the relationship of the voter image of a conservative 
political candidate and the distance in miles between the residence of the voter and the residence 
of the candidate. Each of 12 voters rated the candidate on a scale of 1 to 20. The resulting data 
are shown in the following table. 


Voter Rating Distance 


1 12 Т5 
2 F 165 
3 5 300 
4 19 15 
5 17 180 
6 12 240 
7 9 120 
8 18 60 
9 3 230 
10 8 200 
11 15 130 
12 4 130 


a Calculate the Spearman rank correlation coefficient, rs. 


b Do these data provide sufficient evidence to indicate a negative correlation between rating 
and distance? 


Refer to Exercise 15.12. Compute Spearman's rank correlation coefficient for these data and 
test Ho : ps = О at the 10% level of significance. 


The data shown in the accompanying table give measures of bending and twisting stiffness as 
measured by engineering tests for 12 tennis racquets. 
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Bending Twisting 
Racquet Stiffness Stiffness 


1 419 227 
2 407 231 
3 363 200 
4 360 211 
5 257 182 
6 622 304 
7 424 384 
8 359 194 
9 346 158 
10 556 225 
11 474 305 
12 441 235 


a Calculate the value of the rank correlation coefficient rs between bending stiffness and 


twisting stiffness. 


Use the test based on the rank correlation coefficient to determine whether there is a signif- 
icant positive relationship between bending stiffness and twisting stiffness. Use о = .05. 


Refer to Exercise 11.4. Regard both book and audited values as random variables and test for 
positive correlation between the two by using Spearman’s rank correlation coefficient. Give 
bounds for the p-value associated with the test. 


Refer to Exercise 11.8. Treating both flow-through and static values as random variables, test for 
the presence of a correlation between the two by using Spearman’s rank correlation coefficient, 
with о = .10. 


Some General Comments on 
Nonparametric Statistical Tests 


The nonparametric statistical tests presented in the preceding pages represent only 
a few of the many nonparametric statistical methods of inference available. A much 
larger collection of nonparametric procedures, along with worked examples, is given 
in the texts listed in the references [for instance, see Conover (1999), Hollander and 
Wolfe (1999), and Daniel (2000)]. Many of the nonparametric hypotheses-testing 
procedures can be adapted to provide associated point and interval estimators for 
location parameters and differences in location parameters. Nonparametric proce- 
dures are also available for handling some of the inferential problems associated with 
the linear model. 

We have indicated that nonparametric testing procedures are particularly useful 
when experimental observations are susceptible to ordering but cannot be measured 
on a quantitative scale. Parametric statistical procedures can rarely be applied to 
this type of data. Hence, any inferential procedures must be based on nonparametric 
methods. 
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A second application of nonparametric statistical methods is in testing hypotheses 
associated with populations of quantitative data when uncertainty exists concerning 
the satisfaction of assumptions about the form of the population distributions. Just 
how useful are nonparametric methods for this situation? Nonparametric statistical 
methods are rapid and often lead to an immediate decision in testing hypotheses. When 
experimental conditions depart substantially from the basic assumptions underlying 
parametric tests, the response measurements often can be transformed to alleviate 
the condition, but an unfortunate consequence frequently develops: The transformed 
response is no longer meaningful from a practical point of view, and analysis of the 
transformed data no longer answers the objectives of the experimenter. The use of 
nonparametric methods often circumvent this difficulty. Finally, notice that many non- 
parametric methods are nearly as efficient as their parametric counterparts when the 
assumptions underlying the parametric procedures are true; and as noted earlier, they 
could be more efficient when the assumptions are not satisfied. These reasons suggest 
that nonparametric techniques play a very useful role in statistical methodology. 
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Supplementary Exercises 


Text not available due to copyright restrictions 


Two gourmets, A and B, rated 20 meals on a scale of 1 to 10. The data are shown in the 
accompanying table. Do the data provide sufficient evidence to indicate that one of the gourmets 
tends to give higher ratings than the other? Test by using the sign test with a value of o near .05. 


Meal A B Meal A В 
1 6 8 11 6 9 
2 4 5 12 8 5 
3 7 4 13 4 2 
4 8 7 14 3 3 
5 2 3 15 6 8 
6 7 4 16 9 10 
7 9 9 17 9 8 
8 ү 8 18 4 6 
9 2 5 19 4 3 

10 4 3 20 5 5 


Refer to the comparison of gourmet meal ratings in Exercise 15.62 and use the Wilcoxon 
signed-rank test to determine whether the data provide sufficient evidence to indicate a differ- 
ence in the ratings of the two gourmets. Test by using a value of w near .05. Compare the results 
of this test with the results of the sign test in Exercise 15.62. Are the test conclusions consistent? 
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In an investigation of the visual-scanning behavior of deaf children, measurements of eye- 
movement rates, were taken on nine deaf and nine hearing children. From the data given in 
the table, is there sufficient evidence to justify claiming that the distributions of eye-movement 
rates differ for deaf children A and hearing children B? 


Deaf Children Hearing Children 


A B 
2.75 (15) .89 (1) 
2.14 (11) 1.43 (7) 
3.23 (18) 1.06 (4) 
2.07 (10) 1.01 (3) 
2.49 (14) 94 (2) 
2.18 (12) 1.79 (8) 
3.16 (17) 1.12 (5.5) 
2.93 (16) 2.01 (9) 
2.20 (13) 1.12 (5.5) 

Rank Sum 126 45 


A comparison of reaction (in seconds) to two different stimuli in a psychological word- 
association experiment produced the results in the accompanying table when applied to a 
random sample of 16 people. Do the data present sufficient evidence to indicate a difference 
in location for the distributions of reaction times for the two stimuli? Use the Mann-Whitney 
U statistic and test with a = .05. (Note: This test was conducted by using Student's t in 
Exercise 13.3. Compare your results.) 


Stimulus 1 Stimulus 2 


4 


NWR NF N We 
шы чә l2. Re WW м 


If (as in the case of measurements produced by two well-calibrated instruments) the means 
of two populations are equal, the Mann—Whitney U statistic can be used to test hypothe- 
ses concerning the population variances (or more general measures of variability) as follows. 
As in Section 15.6, identify population I as the population from which the smaller sample 
size is taken. Rank the combined sample. Number the ranked observations from the outside 
in; that is, number the smallest observation 1; the largest, 2; the next to smallest, 3; the next 
to largest, 4; and so on. This final sequence of numbers induces an ordering on the symbols 
x (sample I observations) and y (sample II observations). If of < 07, one would expect to 
find a preponderance of x’s with high ranks and thus a relatively large sum of ranks for the x 
observations. Conversely, if оў > о}, most x’s would have low ranks, апа the sum of the ranks 


of the x observations would be small. 


a Given the measurements in the accompanying table, produced by well-calibrated precision 
instruments, A and B, test at near the o = .05 level to determine whether the more expen- 
sive instrument B is more precise than A. (Notice that this implies a one-tailed test.) Use 
the Mann—Whitney U test. 
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Instrument A Instrument В 


1060.21 1060.24 
1060.34 1060.28 
1060.27 1060.32 
1060.36 1060.30 
1060.40 


b Test by using the F statistic of Section 10.9. 


Calculate the probability that О < 2 for n, = п = 5. Assume that no ties occur and that Ho 
is true. 


Calculate the probability that the Wilcoxon T (Section 15.4) is less than or equal to 2 for n = 3 
pairs. Assume that no ties occur and that Но is true. 


To investigate possible differences among production rates for three production lines turning 
out similar items, examiners took independent random samples of total production figures for 
7 days for each line. The resulting data appear in the following table. Do the data provide suf- 
ficient evidence to indicate any differences in location for the three sets of production figures, 
at the 596 significance level? 


Line!  Line2  Line3 


48 41 18 
43 36 42 
39 29 28 
57 40 38 
21 35 15 
47 45 33 
58 32 31 


a Suppose that a company wants to study how personality relates to leadership. Four 
supervisors—I, II, Ш, and IV—with different types of personalities are selected. Several 
employees are then selected from the group supervised by each, and these employees are 
asked to rate the leader of their group on a scale from 1 to 20 (20 signifies highly favorable). 
The accompanying table shows the resulting data. Is there sufficient evidence to indicate that 
one or more of the supervisors tend to receive higher ratings than the others? Use о = 0.05. 


I I ш IV 


20 17 16 8 
19 11 15 12 
20 13 13 10 
18 15 18 14 
17 14 п 9 

16 10 


b Suppose that the company is particularly interested in comparing the ratings of the person- 
ality types represented by supervisors I and III. Make this comparison, using о = .05. 


The leaders of a labor union want to determine its members' preferences before negotiating 
with management. Ten union members are randomly selected, and each member completed an 
extensive questionnaire. The responses to the various aspects of the questionnaire will enable 
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the union to rank, in order of importance, the items to be negotiated. The sample rankings are 
shown in the accompanying table. Is there sufficient evidence to indicate that one or more of 
the items are preferred to the others? Test using a = .05. 


Person Моге Pay Job Stability Fringe Benefits Shorter Hours 


1 2 1 3 4 
2 1 2 3 4 
3 4 3 2 1 
4 1 4 2 3 
2 1 2 3 4 
6 1 3 4 2 
7 2.5 1 2.5 4 
8 3 1 4 2 
9 1:5 1.5 3 4 
10 2 3 1 4 


Six groups of three children matched for IQ and age were formed. Each child was taught the 
concept of time by using one of three methods: lecture, demonstration, or teaching machine. 
The scores shown in the following table indicate the students’ performance when they were 
tested on how well they had grasped the concept. Is there sufficient evidence to indicate that 
the teaching methods differ in effectiveness? Give bounds for the p-value. 


Group Lecture Demonstration Teaching Machine 


1 20 22 24 
2 25 25 27 
3 30 40 39 
+ 37 26 41 
5 24 20 21 
6 16 18 25 


Calculate P(R < 6) for the runs test, where n; = n» = 8 and Но is true. Do not use Table 10, 
Appendix 3. 


Consider a Wilcoxon rank-sum test for the comparison of two probability distributions based 
on independent random samples of n, = n» = 5. Find P(W < 17), assuming that Ho is true. 


For the sample from population I, let U denote the Mann-Whitney statistic and let W denote 
the Wilcoxon rank-sum statistic.) Show that 


U = nım + (1/2)п (nı + 1) — W. 
Refer to Exercise 15.75. 
a Show that E(U) = (1/2)n,n2 when Но is true. 


b Show that V(U) = (1/12)[nynz(n, + n5 + 1)] when Ho is true, where Но states that the 
two populations have identical distributions. 


Let T denote the Wilcoxon signed-rank statistic for n pairs of observations. Show that E(T) = 
(1/4)n(n + 1) and V(T) = (1/24) [n(n + 1) (2n + 1)] when the two populations are identical. 
Observe that these properties do not depend on whether T is constructed from negative or 
positive differences. 
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Refer to the Spearman rank correlation coefficient of Section 15.10. Show that, when there are 
no ties in either the x observations or the y observations, then 


un nia КОЕ) - [22a RE] Zia ROW] 
s= 
/ [n EREDE - Г RET} [n УЕР - [Li ROD | 


63i di 


n(n? — 1)' 
where d; = R(x;) — К(у;). 


796 


16.1 


СНАРТЕК 16 


Introduction to Bayesian 
Methods for Inference 


16.1 Introduction 

16.2 Bayesian Priors, Posteriors, and Estimators 
16.3 Bayesian Credible Intervals 

16.4 Bayesian Tests of Hypotheses 

16.5 Summary and Additional Comments 


References and Further Readings 


Introduction 


We begin this chapter with an example that illustrates the concepts and an application 
of the Bayesian approach to inference making. Suppose that we are interested in 
estimating the proportion of responders to a new therapy for treating a disease that 
is serious and difficult to cure (such a disease is said to be virulent). If p denotes 
the probability that any single person with the disease responds to the treatment, the 
number of responders Y in a sample of size n might reasonably be assumed to have 
a binomial distribution with parameter p. In previous chapters, we have viewed the 
parameter p as having a fixed but unknown value and have discussed point estimators, 
interval estimators, and tests of hypotheses for this parameter. Before we even collect 
any data, our knowledge that the disease is virulent might lead us to believe that the 
value of p is likely to be relatively small, perhaps in the neighborhood of .25. How 
can we use this information in the process of making inferences about p? 

One way to use this prior information about p is to utilize a Bayesian approach. In 
this approach, we model the conditional distribution of Y given p, Y | p, as binomial: 


n y ,n—y 
pip = ()r*« е у= 0, 1, 2,...,n. 


Uncertainty about ће parameter р is handled by treating it as a random variable 
and, before observing any data, assigning a prior distribution to p. Because we know 
that 0 < p < 1 and the beta density function has the interval (0, 1) as support, it is 
convenient to use a beta distribution as a prior for p. But which beta distribution 


16.2 


16.2 Bayesian Priors, Posteriors, and Estimators 797 


should we use? Since the mean of a beta-distributed random variable with parameters 
а and В is џ = a/(a + B) and we thought p might be in the neighborhood of .25, 
we might choose to use a beta distribution with о = 1 and 6 = 3 (and и = .25) as 
the prior for p. Thus, the density assigned to p is 


1 
g(p) = 40 — Pp)’, O0<p<l. 


Since we have specified the conditional distribution of Y | p and the distribution 
of p, we have also specified the joint distribution of (Y, p) and can determine the 
marginal distribution of Y and the conditional distribution of p | Y. After observing 
Y = y, the posterior density of p given Y = y, g*(p | у), can be determined. In the 
next section, we derive a general result that, in our virulent-disease example, implies 
that the posterior density of p given Y — y is 


Е рУ(1— pot, O<p<1. 
rO * DP(n – y +3) 
Notice that the posterior density for p | y is a beta density with о = y+ 1 and 8 = 
n — y + 3. This posterior density is the “updated” (by the data) density of p and is the 
basis for all Bayesian inferences regarding p. In the following sections, we describe 
the general Bayesian approach and specify how to use the posterior density to obtain 
estimates, credible intervals, and hypothesis tests for p and for parameters associated 
with other distributions. 


g'(ply- 
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If Yi, Yo,..., Y, denote the random variables associated with a sample of size п, 
we previously used the notation L(yi, y2, ..., Yn | Ө) to denote the likelihood of the 
sample. In the discrete case, this function is defined to be the joint probability Р(Ү = 
yi, Ya = у»,..., Y, = yn), and in the continuous case, it is the joint density of 
Yi, Yo, ..., Y, evaluated at y1, y2,..., ул. The parameter Ө is included among the 
arguments of L(yi, yo, ..., Ул | Ө) to denote that this function depends explicitly on 
the value of some parameter 0. In the Bayesian approach, the unknown parameter 
0 is viewed to be a random variable with a probability distribution, called the prior 
distribution of 0. This prior distribution is specified before any data are collected and 
provides a theoretical description of information about 0 that was available before 
any data were obtained. In our initial discussion, we will assume that the parameter 
0 has a continuous distribution with density g(@) that has no unknown parameters. 

Using the likelihood of the data and the prior on 6, it follows that the joint likelihood 
of Y1, Yo,..., Yn, 0 is 


fui Уз,...› Уп, 9) = LOM, у, ..., Yn 10) x g(0) 


and that the marginal density or mass function of Y1, Yo, ..., Yn is 


oo 


m(yi, Y2,- -< Yn) =] L(y1, уз,..., Yn 10) x g(0) d0. 
— OO 
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Finally, the posterior density of 0 | y1, yo, ..., Yn is 
PO ons Sis occae EO Yn «n |)x8(0) — 
Јо EO, ya... Yn 10) x 80) dO 
The posterior density summarizes all of the pertinent information about the 


parameter Ө by making use of the information contained in the prior for 0 and the 
information in the data. 


EXAMPLE 16.1 


Solution 


Let Y;, У, ..., Y, denote a random sample from a Bernoulli distribution where 


P(Y; = 1) = p and P(Y; = 0) = 1 — p and assume that the prior distribution for p 
is beta (о, 8). Find the posterior distribution for р. 


Since the Bernoulli probability function can be written as 
р(у1р) = р'(1- р)",  y-0,, 
the likelihood L(yi, у», ..., Yn | p) is 
LO; Yar++++Yn| p) = POL yz Yn | P) 
zh - 3 7°х р”(1— р)! хх р”(1— py 


= pl*(1— py-X», yj = 0, 1 and 0 <р — 1. 
Thus, 
fO, Y» s» Р) = LOM, ys ..., Yn |р) x &(р) 
= PEMA = ру Ел x REI pia — pr^ 
= Г@ +8 + B) Xy+- _ р)": %+й-1 
Г(о©)Г(8) 


апа 
Го + В) 
MOL yn. i er ee ee 
ЛЕ | PE р i 
Го 4 8) F(32yi - o)F(n — > yi +B) 
I'(o)P (8) Г(п + о + 8) 
Finally, the posterior density of p is 


Г(а + p) 2 yi+a—1 145 n—» yitB-1 
Fre) ер 


* ; $5 Уп) = | 0 : 
g(lyi У... Yn) Г(о 4- 8) Г (Уу +а) Г (n – Уу + В) B 
l'(o)F (B) Г(п + o + B) 

Г(п + a + 8) 


_ x 
Г( y; + а)г(п-У yi + B) 

pbweMqp- py», 0 <р<1, 

а beta density with parameters a* = У y; + œ and B* = n — У у; + В. NH 
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Before we proceed, let’s look at some of the implications of the result in Example 
16.1. In the following example, we'll compare the prior and posterior distributions 
for some (for now) arbitrary choices of the parameters of the prior and the results of 
the experiment. 


EXAMPLE 16.2 


Solution 


Consider the virulent-disease scenario and the results of Example 16.1. Compare 
the prior and posterior distributions of the Bernoulli parameter p (the proportion of 
responders to the new therapy) if we chose the values for o and # and observed the 
hypothetical data given below: 


aa=1,B=3,n=5, Уу = 2. 

Ь а= 1, В =3, п= 25, Уу; = 10. 
са = 10, В = 30, п= 5, Уу; = 2. 
а а = 10, В = 30, п = 25, у у= 10. 


Before we proceed, notice that both beta priors have mean 


_— @ 1 0 | 25 
“=at 1+3 10430 — 
and that both hypothetical samples result in the same value of the maximum likelihood 
estimates (MLEs) for p: 


: 40 
Е 2 С 8 5 N did 
As derived in Example 16.1, if y1, y2,..., y, denote the values in a random sample 
from a Bernoulli distribution, where P(Y; = 1) = p and P(Y; 20) = 1 — p, and the 
prior distribution for p is beta (о, В), the posterior distribution for p is beta (a* = 
Уу +a, В“ =n – yi + В). Therefore, for the choices in this example, 


a when the prior is beta (1,3), n = 5, У y; = 2, the posterior is beta with 
о = Уу +а=2+1=3 and В" =п– Уу +8 =5-2+3 = 6. 
b when the prior is beta (1,3), n = 25, Y у; = 10, the posterior is beta with 
=10+1= 11 and 8*=<25—10+3= 18. 
с when the prior is beta (10, 30), n = 5, У y; = 2, the posterior is beta with 
a*=24+10=12 and p* =5—2430= 33. 
d when the prior is beta (10, 30), п = 25, У y; = 10, the posterior is beta with 
а* = 20 and p* = 45. 
Recall that ће mean and variance of a beta (о, В) distributed random variable are 


a 2 ap 
= and o° = . 
a+ В (a + B (a+ B +1) 
The parameters of the previous beta priors and posteriors, along with their means 


and variances are summarized Table 16.1. Figure 16.1(a) contains graphs of the beta 
distributions (priors and posteriors) associated with the beta prior with parameters 
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Table 16.1 Beta priors and posteriors for Example 16.2 


Parameters of 


Distribution п Уу Beta Distribution Mean Variance 
Prior а= 1,. p=3 .2500 .0375 
Posterior 5 2 a*=3, p*—6 3333 0222 
Posterior 25 10 a*=11,p*=18 4074 .0078 
Prior a=10, 8—30  .2500 .0046 
Posterior 3 2 a*=12, В —33 .2667 .0043 
Posterior 25 10  &* = 20, В* = 45  .3077 .0032 


а1рһа: | 1.00 3.00 
beta: | 3.00 6.00 


7.00 Beta3 
Beta2 


1.0 


Betal Beta2 
alpha: | 10.00 12.00 
beta: | 30.00 33.00 


(b) 


Beta3 


20.00 


45.00 
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a = 1, B = 3. Graphs of the beta distributions associated with the beta (10, 30) prior 
are given in Figure 16.1(b). m 


DEFINITION 16.1 


DEFINITION 16.2 


In Examples 16.1 and 16.2, we obtained posterior densities that, like the prior, are 
beta densities but with altered (by the data) parameter values. 


Prior distributions that result in posterior distributions that are of the same func- 
tional form as the prior but with altered parameter values are called conjugate 
priors. 


Any beta distribution is a conjugate prior distribution for a Bernoulli (or a bino- 
mial) distribution. When the prior is updated (using the data), the result is a beta pos- 
terior with altered parameter values. This is computationally convenient since we can 
determine the exact formula for the posterior and thereafter use previously developed 
properties of a familiar distribution. For the distributions that we use in this chapter, 
there are conjugate priors associated with the relevant parameters. These families 
of conjugate priors are often viewed to be broad enough to handle most practical 
situations. As a result, conjugate priors are often used in practice. 

Since the posterior is a bona fide probability density function, some summary 
characteristic of this density provides an estimate for Ө. For example, we could use 
the mean, the median, or the mode of the posterior density as our estimator. If we 
are interested in estimating some function of 0—say, t (0)—we will use the posterior 
expected value of t (0) as our estimator for this function of 0. 


Let Y;, Yo, ..., Y, be a random sample with likelihood function L(yi, y2, ..., 
Yn | Ө), and let Ө have prior density g(@). The posterior Bayes estimator for t (0) 
is given by 


t(0)g = EO) | Yi, 0,..., Yn). 


EXAMPLE 16.3 


Solution 


In Example 16.1, we found the posterior distribution of the Bernoulli parameter p 
based on a beta prior with parameters (о, 6). Find the Bayes estimators for р and 
p(l — p). [Recall that p(1 — p) is the variance of a Bernoulli random variable with 
parameter p]. 


In Example 16.1, we found the posterior density of p to be a beta density with 
parameters o^ = Y у; +a and В* = n — } yi + f: 
I'(a* + p") 


*(plyi Yo 3) = oe pL pt, O<p<l. 
glyo у Уп) = Taare)? р р 
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The estimate for p is the posterior mean of p. From our previous study of the beta 
distribution, we know that 


^ 


DB 


Е(р | у, уз,..., Yn) 

a* 
_ Ух +а o yta 
С Lytatn-Vyt Atat 


Similarly, 
[p(Í — p)s = Е(р(1— p)|y1, у,..., Yn) 
f I'(à* У: ы - 
= 1 — p= p (»0- р)” d 
f v PR anrpy? d- р) p 
P peg А Р 
= o 1 Bd 
к, pru 
_ Гот +В") Го" - bre +1) 
~ TOSTE”) T(a* + £* + 2) 


_ Г(о*+ В") " o* T'(a*)B*T(8*) 
L(a*)T(8*) (0* + B* + I) (a* + В*)Г(а* + B*) 
а" p* 


~ (о + В" + D (a* + В") 
_ (Ly *9)(n - Уу +8) 
© (п+0+ В+ 0) (п +а+ В) 


So, the Bayes estimators for р and p(1 — р) аге 


Lyte — _ (X Yi ca)(n - XY; £) 
jeep) РО eec" ЖШ 


Further examination of the Bayes estimator for р given in Example 16.3 yields 


Р DY +a 
E п+а + В 


(x) (=) 5) (5) 
ксы ш сс; 


Thus, we see that the Bayes estimator for p is a weighted average of the sample mean, 
Y (the MLE for p) and the mean of the beta prior assigned to p. Notice that the prior 
mean of p is given less weight for larger sample sizes whereas the weight given to the 
sample mean increases for larger sample sizes. Also, since E(Y) = p, it is easy to 


EXAMPLE 16.4 


Solution 
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see that the Bayes estimator for p is not an unbiased estimator. Generally speaking, 
Bayes estimators are not unbiased. 

Notice that the estimators obtained in Example 16.3 are both functions of the 
sufficient statistic У ^ Y;. This is no coincidence since a Bayes estimator is always а 
function of a sufficient statistic, a result that follows from the factorization criterion 
(see Theorem 9.4). 

If U is a sufficient statistic for the parameter 0 based on a random sample Y4, 
Y>,..., Y,, then 


Lı, У2, ..., Уп [0) = k(u, 0) x h(yi, y2, «жуз Эн)» 


where (и, 0) is a function only of u and 0 and A(y1, y2,..., Yn) is not a function 
of 0. In addition (see Hogg, McKean, and Craig, 2005), the function k(u, 0) can (but 
need not) be chosen to be the probability mass or density function of the statistic U. 
In accord with the notation in this chapter, we write the conditional density of U | 6 
as k(u | 0). Then, because A(yi, y2, ..., Yn) is not a function of 0, 


= L(y, y... Yn 10) x g(0) 
SS LO, Y2 ---, у 10) х 9 (0) 40 
E k(u|0) x h(yi, yz. -... Yn) х g(0) 
SS KQ 10) x hn, у, ..., Yn) x 800) dO 
_ kul) x 500) 
[A Kk(10)) х g(0) do 


Therefore, in cases where the distribution of a sufficient statistic U is known, the 
posterior can be determined by using the conditional density of U |0. We illustrate 
with the following example. 


g' (0| ут, yo. Yn) 


Let Y1, Yo,..., Y, denote a random sample from a normal population with unknown 
mean u and known variance о2. The conjugate prior distribution for jz is a normal 
distribution with known mean т and known variance 67. Find the posterior distribution 
and the Bayes estimator for и. 


Since U = У Y; is a sufficient statistic for u and is known to have a normal distribution 
with mean ли and variance no], 


Llu |u) = но) —o00 <и «oo 


1 1 
/2хпо? ы Ee 2 


and the joint density of U and џ is 


fu, ш) = L(u |u) x giu) 


1 1 А 
mg 2no2 "a 


— OO <и «OO, —o00 < | « OO. 
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Let us look at the quantity in the above exponent: 


—— (и = m? 
2no2 m 
1 
= ao Pu mo nota — n] 
1 
= aij [52и? 26 Punu + 8? n? и? + nožu? — 2no un + noz] 
no; 


1 
= aW [ (128° + no?) — 2(пё?и + поп) 442и? + пог] 


1 
2no28? 


= nó? + o2 д2 2 ^u + on m Out on 2 
2028? nô? + o2 nô? o2 


(82и? + noon’) 


1 5 
= Doi [(n8? + ou? — 2(8°u + o2mnu] – 


82и + о? п)? 
_ 8242 DD п( о 
| ш: nô? + o2 
Finally, we obtain: 
(и — пи)? ( y= пд? + os и + o2n\? 
2no2 j D HUM T7 32g WV no? 
1 2 
2028. под) V mis 
Therefore, 
fu, p) d Y - Lr 
и, ш) = ех и = п 
Б Mo ARE | 2102 Т а. 


1 nó? +o ( Suto 2) 
= exp u= —L— 
J2nno2V 21 58? 20252 n8? + o2 


E ] 
x exp (и — nn) 


2(n?8? + no2) 
and 


m(u) — 


1 2 
e| 2(n282 + no?) (u nn) | uei wm) 
exp ш du 


y2xno24 2r ô? оо 20267 nô? + o2 


1 ^ E ы o2 Out on т 
2 ех 
exl 2(n?8? + no?) e= nn | изе 2 ama. о? р 
—oo 


J 2zn(n&? + 02) 


21028? 
nó? + 02 
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Recognizing the above integral as that of a normal density function and hence equal 
to 1, we obtain that the marginal density function for U is normal with mean n7 and 
variance (n28? + no2). Further, the posterior density of и. given U = и is 


fu) 1 °| > etei) | 


* и) = == 
8 (u | ) m(u) 270252 20252 nó? + о? 
né?+o2 


—00 < р « oo, 
a normal density with mean 
" 8?^u + on d . 592 o28? 
= | —.— and variance ô* = | — ——5 |. 
í nô? + o2 nô? + o? 
It follows that the Bayes estimator for ju is 
82U 2 82 a 2 
+05 E n Ya с 
nó? + 02 nó? + 02 nô? +02 


йв = 


Again, this Bayes estimator is a weighted average of the MLE, Y, the sample mean, 
and the mean of the prior 7. As the size of the sample п increases, the weight assigned 
to the sample mean Y increases whereas the weight assigned to the prior mean 7 
decreases. a 


16.1 


16.2 


16.3 


Exercises 


Refer to the results of Example 16.2 given in Table 16.1. 


a Which of the two priors has the smaller variance? 


b Compare the means and variances of the two posteriors associated with the beta (1, 3) prior. 
Which of the posteriors has mean and variance that differ more from the mean and variance 
of the beta (1, 3) prior? 

c Answer the questions in parts (a) and (b) for the beta (10, 30) prior. 

d Are your answers to parts (a)-(c) supported by the graphs presented in Figure 16.1(a) 
and (b)? 

e Compare the posteriors based on n — 5 for the two priors. Which of the two posteriors 
has mean and variance that differs more from the mean and variance of the corresponding 
priors? 


Define each of the following: 


Prior distribution for a parameter 0 
Posterior distribution for a parameter 0 


Conjugate prior distribution 


ceo C$ 


Bayes estimator for a function of Ө, t (0) 


Applet Exercise The applet Binomial Revision can be used to explore the impact of data and 
the prior on the posterior distribution of the Bernoulli parameter p. The demonstration at the 
top of the screen uses the beta prior with о = В = 1. 
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a Click the button “Next Trial" to observe the result of taking a sample of size n = 1 froma 
Bernoulli population with p — .4. Did you observe a success or a failure? Does the posterior 
look different than the prior? Are the parameters of the posterior what you expected based 
on the theoretical results of Example 16.1? 

b Click the button “Next Trial" once again to observe the result of taking a sample of total 
size n — 2 from a Bernoulli population with p — .4. How many successes and failures 
have you observed so far ? Does the posterior look different than the posterior that you 
obtained in part (a)? Are the parameters of the posterior what you expected based on the 
theoretical results of Example 16.1? 

с Click the button “Next Trial" several times to observe the result of taking samples of larger 
sizes from a Bernoulli population with p — .4. Pay attention to the mean and variance of 
the posterior distributions that you obtain by taking successively larger samples. What do 
you observe about the values of the means of the posteriors? What do you observe about 
the standard deviations of posteriors based on larger sample sizes? 

d Ontheinitial demonstration on the applet, you were told that the true value of the Bernoulli 
parameter is p = .4. The mean of the beta prior with a = В = 1 is .5. How many trials 
are necessary to obtain a posterior with mean close to .4, the true value of the Bernoulli 
parameter? 

e Click on the button “50 Trials" to see the effect of the results of an additional 50 trials on 
the posterior. What do you observe about the shape of the posterior distributions based on 
a large number of trials? 


Applet Exercise Scroll down to the section "Applet with Controls" on the applet Binomial 
Revision. Here, you can set the true value of the Bernoulli parameter р to any value 0 < p < 1 
(any value of “real” interest) and you can also choose any œ > 0 and В > 0 as the values of the 
parameters of the conjugate beta prior. What will happen if the true value of p — .1 and you 
choose a beta prior with mean 1/4? In Example 16.1, one such sets of values for о and f was 
illustrated: œ = 1, В = 3. Set up the applet to simulate sampling from a Bernoulli distribution 
with p — .1 and use the beta (1, 3) prior. (Be sure to press Enter after entering the appropriate 
values in the boxes.) 


a Click the button "Next Trial" to observe the result of taking a sample of size n — 1 from 
a Bernoulli population with p — .1. Did you observe a success or a failure? Does the 
posterior look different than the prior? 

b Click the button “Next Trial" once again to observe the result of taking a sample of total 
size n — 2 from a Bernoulli population with p — .1. How many successes and failures 
have you observed so far? Does the posterior look different than the posterior you obtained 
in part (a)? 

с If you observed a success on either of the first two trials, click the “Reset” button and start 
over. Next, click the button *Next Trial" until you observe the first success. What happens 
to the shape of the posterior upon observation of the first success? 

d Inthis demonstration, we assumed that the true value of the Bernoulli parameter is p — .1. 
The mean of the beta prior with a = 1, 8 = 3 is .25. Click the button “Next Trial" until 
you obtain a posterior that has mean close to .1. How many trials are necessary? 


Repeat the directions in Exercise 16.4, using a beta prior with о = 10, В = 30. How does the 
number of trials necessary to obtain a posterior with mean close to .1 compare to the number 
you found in Exercise 16.4(d)? 


Suppose that Y is a binomial random variable based оп n trials and success probability p (this 
is the case for the virulent-disease example in Section 16.1). Use the conjugate beta prior with 


16.7 
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parameters o and В to derive the posterior distribution of р | y. Compare this posterior with 
that found in Example 16.1. 


In Section 16.1 and Exercise 16.6, we considered an example where the number of responders 
to a treatment for a virulent disease in a sample of size n had a binomial distribution with 
parameter p and used a beta prior for p with parameters о = 1 and f = 3. 


a Find the Bayes estimator for p — the proportion of those with the virulent disease who 
respond to the therapy. 


b Derive the mean and variance of the Bayes estimator found in part (a). 


Refer to Exercise 16.6. If Y is a binomial random variable based on n trials and success 
probability p and p has the conjugate beta prior with parameters o = 1 and f = 1, 


a determine the Bayes estimator for p, pz. 
b whatis another name for the beta distribution with о = 1 and В = 1? 


с findthe mean square for error (MSE) of the Bayes estimator found in part (a). [Hint: Recall 
Exercise 8.17]. 

d For what values of p is the MSE of the Bayes estimator smaller than that of the unbiased 
estimator p = Y/n? 


Suppose that we conduct independent Bernoulli trials and record Y, the number of the trial 
on which the first success occurs. As discussed in Section 3.5, the random variable Y has a 
geometric distribution with success probability p. A beta distribution is again a conjugate prior 
for p. 


a If we choose a beta prior with parameters о and f, show that the posterior distribution of 
p | у is beta with parameters a* = œ + 1 and B* = B + y — 1. 


b Find the Bayes estimators for p and p(1— p). 


Let Yi, Y», ..., Y, denote a random sample from an exponentially distributed population with 
density f(y|0) = 0е7°, 0 < y. (Note: the mean of this population is ш = 1/0.) Use the 
conjugate gamma (о, В) prior for 0 to do the following. 


a Show that the joint density of Yi, Yo, ..., Y,, @ is 


gnta-l p 
fov sooo dno = Fee | (ет). 


b Show that the marginal density of Y,, Y2,..., Y, is 
T (n ge о) ( В ) 
Г(о)В“ \BY у» +1 | 
€ Show that the posterior density for 0 | (yi, y2,..., Yn) isa gamma density with parameters 


a* =n Ба and 8* = B/(B Y^ y; + 1). 
d Show that the Bayes estimator for и = 1/0 is 


x 1 
n+a—1 f(nc-ra—-l) 


m(yi, уз,..., Yn) = 


йв = 


[Hint: Recall Exercise 4.11 1(e).] 


e Show that the Bayes estimator in part (d) can be written as a weighted average of Y and 
the prior mean for 1/0. [Hint: Recall Exercise 4.111(e).] 


f Show that the Bayes estimator in part (d) is a biased but consistent estimator for и = 1/0. 
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Let Y;, Y2,..., Y, denote a random sample from а Poisson-distributed population with mean 
A. In this case, U = Y Y; is a sufficient statistic for A, and U has a Poisson distribution with 
mean nd. Use the conjugate gamma (о, £) prior for л to do the following. 


a Show that the joint likelihood of U, A is 


n" u+ra— B 
LU) = aera еар E (+ i) | 


b Show that the marginal mass function of U is 
n"[ (и +в) p did 
и!В"Г(а) \п8-+Е1 ` 
с Show that the posterior density for A | и is a gamma density with parameters a* = и + a 
and 8* = 8/(п8 + 1). 


d Show that the Bayes estimator for А. is 


m(u) — 


(ZY +4) B 
nB-c-1 ` 

e Show that the Bayes estimator in part (d) can be written as a weighted average of Y and 
the prior mean for A. 


Ав = 


f Show that the Bayes estimator in part (d) is a biased but consistent estimator for А. 


Let Yi, Y2,..., Y, denote a random sample from a normal population with known mean LL, 
and unknown variance 1/v. In this case, U = Y (Y; — шь)? is a sufficient statistic for v, and 
W = vU has a x? distribution with п degrees of freedom. Use the conjugate gamma (о, В) 
prior for v to do the following. 


a Show that the joint density of U, v is 


u 2-1, Q/2)ra-l 28 
flu,v) = T GOT (n/2 8207D | v / Gs 2 ;)] 


b Show that the marginal density of U is 


um 2-1 2B (п/2)+о Е 
"ic Keg) 
Г(а)Г (n/2) 82% NuB +2 2 


с Show that the posterior density for v | u isa gamma density with parameters а* = (n/2) + а 
and 8* = 28/(иВ + 2). 

d Show that the Bayes estimator for о? = 1/v is 62 = (UB + 2)/[B(n + 2a — 2)]. [Hint: 
Recall Exercise 4.111(e).] 


e The МГЕ for о? is U/n. Show that the Bayes estimator in part (d) can be written as a 
weighted average of the MLE and the prior mean of 1/v. [Hint: Recall Exercise 4.111(e).] 


Bayesian Credible Intervals 


In previous sections, we have determined how to derive classical confidence intervals 
for various parameters of interest. In our previous approach, the parameter of interest 
0 had a fixed but unknown value. We constructed intervals by finding two random 
variables 6, and др, the lower and upper confidence limits, such that 6, < бу апа 
so that the probability that the random interval (д, бо) enclosed the fixed value Ө 
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was equal to the prescribed confidence coefficient 1 — a. We also considered how 
to form one-sided confidence regions. The key realization in our pre-Bayesian work 
was that the interval was random and the parameter was fixed. In Example 8.11, we 
constructed a confidence interval for the mean of a normally distributed population 
with unknown variance using the formula 


res (5)-[r- (2) n Q1 


In this case, the upper and lower endpoints of the interval are clearly random variables. 
Upon obtaining data, calculating the realized values of the sample mean y — 2959 
and the sample variance s = 39.1 and using n = 8 and 05 = 2.365, we determined 
that our realized confidence interval for the mean muzzle velocity for shells of the 
type considered is (2926.3, 2991.7). This is a fixed interval that either contains the 
true mean muzzle velocity or does not. We say that the interval is a 9596 confidence 
interval because if independent and separate samples, each of size n — 8 were taken 
and the resulting (different) intervals were determined, in the long run, 95% of the 
intervals would contain the true mean. The parameter is fixed, the endpoints of the 
interval are random, and different samples will yield different realized intervals. 

In the Bayesian context, the parameter 0 is a random variable with posterior density 
function g*(@). If we consider the interval (a, b), the posterior probability that the 
random variable 0 is in this interval is 


b 
P*(a<@<b)= | g*(0) 40. 


If the posterior probability P*(a < 0 < b) = .90, we say that (a, b) is a 9096 credible 
interval for Ө. 


EXAMPLE 16.5 


In Example 8.11, it was reasonable to assume that muzzle velocities were normally 
distributed with unknown mean 44. In that example, we assumed that the variance of 
muzzle velocities o? was unknown. Assume now that we are interested in forming 
a Bayesian credible interval for u and believe that there is a high probability that 
the muzzle velocities will be within 30 feet per second of their mean и. Because 
a normally distributed population is such that approximately 95% of its values are 
within 2 standard deviations of its mean, it might be reasonable to assume that the 
underlying distribution of muzzle velocities is normally distributed with mean u and 
variance o? such that 20, = 30, that is with o2 = 225. 

If, prior to observing any data, we believed that there was a high probability that 
u was between 2700 and 2900, we might choose to use a conjugate normal prior for 
ш with mean r and variance 6” chosen such that n — 25 = 2700 and т] + 25 = 2900, 
or n = 2800 and 8? = 50? = 2500. Note that we have assumed considerably more 
knowledge of muzzle velocities than we did in Example 8.11 where we assumed only 
that muzzle velocities were normally distributed (with unknown variance). If we are 
comfortable with this additional structure, we now take our sample of size n — 8 and 
obtain the muzzle velocities given below: 


3005 2925 2935 2965 
2995 3005 2937 2905 
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Use the general form for the posterior density for jz | и developed in Example 16.4 to 
give a 95% credible interval for u. 


This scenario is a special case of that dealt with in Example 16.4. In this application 
of that general result, 
n=8, и= Уу =23,672,  o2—225, п= 2800, 8° = 2500. 


In Example 16.4, we determined that the posterior density of u | и is a normal density 
with mean 7* and variance ó*? given by 


_ Pu+o2n _ (2500)(23672) + (225)(2800) 


= = = 2957.23, 
1 = 02 8(2500) + 225 

252 225) (2 
pa жы ai 


nd2+02 8(2500) + 225 . 


Finally, recall that any normally distributed random variable W with mean uw and 


variance с}, is such that 


P(uw — 1.960wy < W < uw + 1.96 ow) = .95. 
It follows that a 95% credible interval for u is 
(n* — 1.96 8*, n* + 1.96 8*) = (2957.23 — 1.964/27.81, 2957.23 + 1.964/27.81) 
— (2946.89, 2967.57). ш 


It is important to note that different individuals constructing credible intervals for 
и using the data in Example 16.5 will obtain different intervals if they choose different 
values for any of the parameters 7, 62, and 22, Nevertheless, for the choices used in 
Example 16.5, upon combining her prior knowledge with the information in the data, 
the analyst can say that the posterior probability is .95 that the (random) jz is in the 
(fixed) interval (2946.89, 2967.57). 


EXAMPLE 16.6 


Solution 


In Exercise 16.10, it was stated that if Y1, Y2,..., Y, denote a random sample from 
an exponentially distributed population with density f(y|0) = 0е7°, 0 < y, and 
the conjugate gamma prior (with parameters o and В) for Ө was employed, then the 
posterior density for Ө is a gamma density with parameters o^ = n + œ and 8* = 
B/(B Y y; + D. Assume that an analyst chose a = 3 and f = 5 as appropriate pa- 
rameter values for the prior and that a sample of size n = 10 yielded that Y^ y; = 1.26. 
Construct 90% credible intervals for 0 and the mean of the population, u = 1/0. 


In this application of the general result given in Exercise 16.10, 
n — 10, ше У yi = 1.26, a = 3, B=5. 
The resulting posterior density of 0 is a gamma density with о* and £* given by 
а =n +ga = 10 +3 = 13, 
. f 5 


= = = 685. 
BXx*1 502641 
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To complete our calculations, we need to find two values a and b such that 
P*(a < 0 < b) = .90. 
If we do so, a 90% credible interval for 0 is (a, b). Further, because 
а<0 <Б ifandonlyif 1/b < 1/0 < 1/а, 


it follows that a 90% credible interval for u = 1/0 is (1/b, 1/a). 

Although we do not have a table giving probabilities associated with gamma- 
distributed random variables with different parameter values, such probabilities can be 
found using one of the applets accessible at www.thomsonedu.com/statistics/wackerly. 
R, S-Plus, and other statistical software can also be used to compute probabilities 
associated with gamma-distributed variables. Even so, there will be infinitely many 
choices for a and b such that P*(a < 0 < b) = .90. If we find values a апар such that 


Р*(0 > а) = .95 and Р*(0 > Ь) = .05, 


these values necessarily satisfy our initial requirement that P*(a < 0 < Б) = .90. 

In our present application, we determined that 0 has a gamma posterior with param- 
eters о/* = 13 and f* = .685. Using the applet Gamma Probabilities and Quantiles 
on the Thomson website, we determine that 


P*(6 > 5.2674) = .95 and P*(6 > 13.3182) = .05. 


Thus, for the data observed and the prior that we selected, (5.2674, 13.3182) is a90% 

credible interval for © whereas [1/(13.3182), (1/5.2674)] = (.0751, .1898) is a 
90% credible interval for u = 1/0. 

The К (or S-Plus) command qgamma (.05,13,1/.685)also yields the value 

a = 5.2674 given above, whereas qgamma (.95,13,1/.685) givesb = 13.3182. 

[| 
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Applet Exercise Activate the applet Binomial Revision and scroll down to the section labeled 
“Credible Interval.” Change the value of the Bernoulli proportion to 0.45 and the parameters 
of the beta prior to а = 3 and В = 5 and press Enter on your computer. 


a What is the data-free credible interval for p based on the beta (3, 5) prior? 


b Usethe applet Beta Probabilities and Quantiles (accessible at the www.thomsonedu.com/ 
statistics/wackerly) to calculate the prior probability that p is larger than the upper endpoint 
of the interval that you obtained in part (a). Also calculate the probability that p is smaller 
than the lower endpoint of the interval that you obtained in part (a). 

с Based on your answers to part (b), what is the prior probability that p is in the interval 
that you obtained in part (a)? Do you agree that the interval obtained in part (a) is a 9596 
credible interval for p based on the beta (3, 5) prior? 

d Click the button “Next Trial" once. Is the posterior based on the sample of size 1 different 
than the prior? How does the posterior differ from the prior? 
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e What is a 95% credible interval based on the prior and the result of your sample of size 1? 
Is it longer or shorter than the interval obtained (with no data) in part (a)? 

f Click the button “Next Trial” once again. Compare the length of this interval (based on the 
results of a sample of size 2) to the intervals obtained in parts (a) and (e). 

g Use the applet Beta Probabilities and Quantiles to calculate the posterior probability that 
р is larger than the upper endpoint of the interval that you obtained in part (f). Does the 
value of this posterior probability surprise you? 

h Click the button “Next Trial" several times. Describe how the posterior is changed by 
additional data. What do you observe about the lengths of the credible intervals obtained 
using posteriors based on larger sample sizes? 


Applet Exercise Refer to Exercise 16.13. Select a value for the true value of the Bernoulli 
proportion p and values for the parameters of the conjugate beta prior. 


a Repeat Exercise 16.13(a)-(h), using the values you selected. 


b Alsoclick the button “50 Trials" a few times. Observe the values of the successive posterior 
standard deviations and the lengths of the successive credible intervals. 


і What do you observe about the standard deviations of the successive posterior distri- 
butions? 


ii Based on your answer to part (i), what effect do you expect to observe about the lengths 
of successive credible intervals? 


iii Did the lengths of the successive credible intervals behave as you anticipated in 
part (11)? 


Applet Exercise In Exercise 16.7, we reconsidered our introductory example where the num- 
ber of responders to a treatment for a virulent disease in a sample of size п had a binomial 
distribution with parameter p and used a beta prior for p with parameters o = 1 and В = 3. We 
subsequently found that, upon observing Y — y responders, the posterior density function for 
р | y is a beta density with parameters a = у Ба = y + 1 and 8 =n-—y+fh=n—-y+3. 
If we obtained a sample of size п = 25 that contained 4 people who responded to the new 
treatment, find a 95% credible interval for p. [Use the applet Beta Probabilities and Quantiles 
at www.thomsonedu.com/statistics/wackerly. Alternatively, if W is a beta-distributed random 
variable with parameters o and В, ће А (or S-Plus) command qbeta(p,a,f) gives the 
value w such that P(W < w) = p.] 


Applet Exercise Repeat the instructions for Exercise 16.15, assuming a beta prior with 
parameters œ = 1 and 6 = 1 [a prior that is uniform on the interval (0, 1)]. (See the re- 
sult of Exercise 16.8.) Compare this interval with the one obtained in Exercise 16.15. 


Applet Exercise In Exercise 16.9, we used a beta prior with parameters о and £ and found the 
posterior density for the parameter p associated with a geometric distribution. We determined 
that the posterior distribution of p | y is beta with parameters o* = œ + 1 and 8* = f + y — 1. 
Suppose we used o = 10 and В = 5 in our beta prior and observed the first success on trial 6. 
Determine an 8046 credible interval for p. 


Applet Exercise In Exercise 16.10, we found the posterior density for 0 based on a sample 
of size n from an exponentially distributed population with mean 1/0. Specifically, using the 
gamma density with parameters о and В as the prior for 0, we found that the posterior density for 
0 | (у, yo. +++, Yn) isa gamma density with parameters «* = n + o and f* = 8/(8 Уу + 1). 
Assuming that a sample of size п = 15 produced a sample such that Уу; = 30.27 and 
that the parameters of the gamma prior аге а = 2.3 and В = 0.4, use the applet Gamma 
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Probabilities and Quantiles to find 80% credible intervals for Ө and 1/0, the mean of the 
exponential population. 


Applet Exercise In Exercise 16.11, we found the posterior density for A, the mean of a Poisson- 
distributed population. Assuming a sample of size п and a conjugate gamma (о, В) prior for 
A, we showed that the posterior density of à | У y; is gamma with parameters a* = Y! y; +a 
and В* = В/(пВ + 1). If a sample of size п = 25 is such that Ў y; = 174 and the prior 
parameters were (a = 2, В = 3), use the applet Gamma Probabilities and Quantiles to find a 
95% credible interval for А. 


Applet Exercise In Exercise 16.12, we used a gamma (о, В) prior for v and a sample of size 
n from a normal population with known mean jz, and variance 1/v to derive the posterior for 
v. Specifically, if u = Y (y; — Ho)’, we determined the posterior of v | и to be gamma with 
parameters a* = (n/2) + о and B* = 2B/(up + 2). If we choose the parameters of the prior 
to be (е = 5, В = 2) and a sample of size n = 8 yields the value и = .8579, use the applet 
Gamma Probabilities and Quantiles to determine 9096 credible intervals for v and 1/v, the 
variance of the population from which the sample was obtained. 


Bayesian Tests of Hypotheses 


Tests of hypotheses can also be approached from a Bayesian perspective. As we 
have seen in previous sections, the Bayesian approach uses prior information about 
a parameter and information in the data about that parameter to obtain the posterior 
distribution. If, as in Section 10.11 where likelihood ratio tests were considered, we 
are interested in testing that the parameter 0 lies in one of two sets of values, Qo and 
Qa, we can use the posterior distribution of 0 to calculate the posterior probability 
that 0 is in each of these sets of values. When testing Ho :0 € Qo versus Ha :0 € Qa, 
one often-used approach is to compute the posterior probabilities P*(0 € Qo) and 
P*(0 € Q4) and accept the hypothesis with the higher posterior probability. That is, 
for testing Ho:0 є Qo versus H,:0 € Qa, 


accept Ho if P*(0 € Qo) > P*(0 € Qa), 
accept Н. if P*(0 € Qa) > P*(0 є Qo). 


EXAMPLE 16.7 


In Example 16.5, we obtained a 95% credible interval for the mean muzzle velocity 
associated with shells prepared with a reformulated gunpowder. We assumed that 
the associated muzzle velocities are normally distributed with mean и and variance 
a = 225 and that a reasonable prior density for и is normal with mean n = 2800 
and variance à? = 2500. We then used the data 


3005 2925 2935 2965 
2995 3005 2937 2905 


to obtain that the posterior density for jz is normal with mean ņn* = 2957.23 and 
standard deviation 8* = 5.274. Conduct the Bayesian test for 


Ay: € 2950 versus Ay: > 2950. 
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In this case, if Z has a standard normal distribution, 


P*(0 € Qo) = P*(u < 2950) 
2950 — n* 2950 — 2957.23 
Ри _ E o 22 = m 
5* 5.274 
= P(Z < —1.37) = .0951, 


and P*(0 є Qa) = P*(u > 2950) = 1 — P*(u < 2950) = .9049. Thus, we see that 
the posterior probability of H, is much larger than the posterior probability of Но and 
our decision is to accept H, : > 2950. 


Again, we note that if a different analyst uses the same data to conduct a Bayesian 
test for the same hypotheses but different values for any of л, 52, and o2, she will 
obtain posterior probabilities of the hypotheses that are different than those obtained 
in Example 16.7. Thus, different analysts with different choices of values for the prior 
parameters might reach different conclusions. 

In the frequentist settings discussed in the previous chapters, the parameter 0 has 
a fixed but unknown value, and any hypothesis is either true or false. If 0 є ©, 
then the null hypothesis is certainly true (with probability 1), and the alternative 
is certainly false. If 0 є Q4, then the alternative hypothesis is certainly true (with 
probability 1), and the null is certainly false. The only way we could know whether or 
not0 € Qg is if we knew the true value of Ө. If this were the case, conducting a test of 
hypotheses would be superfluous. For this reason, the frequentist makes no reference 
to the probabilities of the hypotheses but focuses on the probability of a type I error, a, 
and the power of the test, power (0) = 1 — 8(0). Conversely, the frequentist concepts 
of size and power are not of concern to an analyst using a Bayesian test. 


EXAMPLE 16.8 


Solution 


In Example 16.6, we used a result given in Exercise 16.7 to obtain credible intervals 
for 0 and the population mean и based on Yi, Y2,..., Ү,, a random sample from an 
exponentially distributed population with density f(y|0) = 0е7°, 0 < y. Using a 
conjugate gamma prior for 0 with parameters œ = 3 and В = 5, we obtained that the 
posterior density for Ө is a gamma density with parameters о* = 13 and В* = .685. 
Conduct the Bayesian test for 


Но: > .12 versus Ay: x .12. 


Since the mean of the exponential distribution is u = 1/0, the hypotheses are equiv- 
alent to 


Ho:0 <1/(.12) = 8.333 versus H,:0 > 8.333. 


Because the posterior density for Ө is a gamma density with parameters o^ = 13 and 
В“ = .685, 


Р*(0 € Qo) = Р*(0 —8.333 and P*(0 € Qa) = P'(0 > 8.333). 
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In our present application, we determined that 6 has a gamma posterior with param- 
eters œ* = 13 and 8* = .685. Using the applet Gamma Probabilities and Quantiles, 


P*(0 € Q,) = P*(0 > 8.333) = 0.5570, 
and 
P*(0 € Qo) = P*(0 < 8.333) = 1 — Р*(0 > 8.333) = 0.4430. 


In this case, the posterior probability of H, is somewhat larger than the posterior 
probability of Ho. It is up to the analyst to decide whether the probabilities are 
sufficiently different to merit the decision to accept Ha : и < .12. 

If you prefer to use Ё or S-Plus to compute the posterior probabilities of the 
hypotheses, pgamma (8.333,13,1/.685) yields P*(0 є ©) = P*(0 < 8.333) 
and P*(0 є Qa) = Р*(0 > 8.333) = 1 — P*(0 є Qo). ш 
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16.22 


16.23 


16.24 


16.25 


16.26 


Exercises 


Applet Exercise In Exercise 16.15, we determined that the posterior density for p, the propor- 
tion of responders to the new treatment for a virulent disease, is a beta density with parameters 
a* = 5 and f* = 24. What is the conclusion of a Bayesian test for Ho: р < .3 versus Н, : 
p = .3? [Use the applet Beta Probabilities and Quantiles at www.thomsonedu.com/statistics/ 
wackerly. Alternatively, if W is a beta-distributed random variable with parameters o and f, 
the А or S-Plus command pbeta (w,o, B) gives P(W < w).] 


Applet Exercise Exercise 16.16 used different prior parameters but the same data to determine 
that the posterior density for p, the proportion of responders to the new treatment for a virulent 
disease, is a beta density with parameters o^ = 5 and f* = 22. What is the conclusion of a 
Bayesian test for Но: p < .3 versus H, : p > .3? Compare your conclusion to the one obtained 
in Exercise 16.21. 


Applet Exercise In Exercise 16.17, we obtained a beta posterior with parameters o* = 11 and 
В“ = 10 for the parameter p associated with a geometric distribution. What is the conclusion 
of a Bayesian test for Ho : p < .4 versus H; : p > .4? 


Applet Exercise In Exercise 16.18, we found the posterior density for 0 to be a gamma density 
with parameters о* = 17.3 and 8* = .0305. Because the mean of the underlying exponential 
population is м = 1/0, testing the hypotheses Ho: и <2 versus H,: > 2 is equivalent to 
testing Но:0 > .5 versus H,:0 < .5. What is the conclusion of a Bayesian test for these 
hypotheses? 


Applet Exercise In Exercise 16.19, we found the posterior density for A, the mean of a Poisson- 
distributed population, to be a gamma density with parameters a* = 176 and В* = .0395. What 
is the conclusion of a Bayesian test for Ho: à > 6 versus H4 : À < 6? 


Applet Exercise In Exercise 16.20, we determined the posterior of v | u to be a gamma density 
with parameters o* = 9 and В* = 1.0765. Recall that v = 1/6o?, where o? is the variance of the 
underlying population that is normally distributed with known mean us. Testing the hypotheses 
Ho:0? > 0.1 versus H,:60? < 0.1 is equivalent to testing Ho:v < 10 versus H,:v > 10. 
What is the conclusion of a Bayesian test for these hypotheses? 
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Summary and Additional Comments 


As we have seen in the previous sections, the key to Bayesian inferential methods 
(finding estimators, credible intervals, or implementing tests of hypotheses) is finding 
the posterior distribution of the parameter Ө. Especially when there are little data, this 
posterior is heavily dependent on the prior and the underlying distribution of the 
population from which the sample is taken. We have focused on the use of conjugate 
priors because of the resulting simplicity of finding the requisite posterior distribution 
of the parameter of interest. Of course, conjugate priors are not the only priors that 
can be used, but they do have the advantage of resulting in easy computations. This 
does not mean that a conjugate prior is necessarily the correct choice for the prior. 
Even if we correctly select the family from which the prior is taken (we have made 
repeated use of beta and gamma priors), there remains the difficulty of selecting the 
appropriate values associated with the parameters of the prior. We have seen, however, 
that the choice of the parameter values for the prior has decreasing impact for larger 
sample sizes. 

It is probably appropriate to make a few more comments about selecting values 
of the parameters of the prior density. If we use a normal prior with mean v and 
variance ô? and think that the population parameter is likely (unlikely) to be close to 
v, we would use a relatively small (large) value for 2. When using a beta prior with 
parameters o and В for a parameter that we thought had value close to c, we might 
select œ and В such that the mean of the prior, о/(о + В), equals c and the variance of 
the prior, o8 /[(o + В) (a + B + 1)], is small. In the introductory example, we used 
a beta prior with о = 1 and В = 3 because we thought that about 25% of those given 
the new treatment would favorably respond. The mean and standard deviation of the 
posterior are, respectively, .25 and .1936. Note that these are not the only choices 
for о and В that give .25 as the mean of the prior. In general, if a/(a+ В) = c, 
then for апу k > 0, a = ka and В' = kf also satisfy a’/(a’ + 8”) = c. However, 
for a beta density with parameters a’ = ko and В’ = kf, the variance of the prior 
is o B'[(o + 8)? (a + В' + 1)] = aB/[(a + B)? (ko + kB + 1)]. Therefore, if our 
initial choice of o and @ give an appropriate value for the mean of the prior but we 
prefer a smaller variance, we can achieve this by selecting some k > 1 and using 
a’ = ka and f' = kf as the prior parameters. Conversely, choosing some k < 1 and 
using a’ = ka and В’ = kf as the prior parameters gives the same prior mean but 
larger prior variance. Hence, a more vague prior results from choosing small values 
of o and £ that are such that o/(o + В) = c, the desired prior mean. 

One of the steps in determining the prior is to determine the marginal distribu- 
tion of the data. For continuous priors, this is accomplished by integrating the joint 
likelihood of the data and the parameter over the region of support for the prior. In 
our previous work, we denoted the resulting marginal mass or density function for 
the random variables Y;, Y2,..., Y, ina sample of size n as m(y1, yo, ..., Yn) OF as 
m(u) if U is a sufficient statistic for Ө. This marginal mass or density function is 
called the predictive mass or density function of the data. We have explicitly given 
these predictive distributions in all of our applications. This is because, to paraphrase 
Berger (1985, p. 95), interest in the predictive distribution centers on the fact that this 
is the distribution according to which the data will actually occur. As discussed in Box 
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(1980, pp. 385—386), potential evidence of inappropriate model selection is provided 
by the predictive distribution of the data, not the posterior distribution for the pa- 
rameter. Some expert Bayesian analysts choose to model the predictive distribution 
directly and select the prior that leads to the requisite predictive distribution. The 
Reverend Thomas Bayes (1784) used a uniform (0, 1) prior for the Bernoulli (or 
binomial) parameter p because this prior leads to the predictive distribution that he 
thought to be most appropriate. Additional comments relevant to the choice of some 
prior parameters can be found in Kepner and Wackerly (2002). 

The preceding paragraph notwithstanding, it is true that there is a shortcut to finding 
the all-important posterior density for 0. As previously indicated, if L(y1, yo, ..., Yn | 
Ө) is the conditional likelihood of the data and 0 has continuous prior density g (0), 
then the posterior density of 0 is 


L(yi, Уз, ... Yn | Ө) x (0) 
fS LOL, yos... Yn 10) х 800) dO 


gs (| у, У2, окуз Ўл) = 


Notice that the denominator on the right hand side of the expression depends on 
Yi, Y2, ---, Уп, but does not depend on 0. (Definite integration with respect to 0 
produces a result that is free of Ө.) Realizing that, with respect to Ө, the denominator 
is a constant, we can write 


g'(0|yi, уз,..., Yn) = CON, yo. уп) Оут, yo... Yn 10) x g(0), 


where 


1 
— ff Li, уо, ..., ул 10) x g(0) dO 


с(у\, У2,..., Yn) 


does not depend on Ө. Further, notice that, because the posterior density is a bona fide 
density function, the quantity c(yi, y2,..., уп) must be such that 


со 
А g'(0 | yi, yo, ..., Yn) dO 
—oo 


oo 
= сб, ssa) f Ыыы ы [йз а On mae 
—oo 


Finally, we see that the posterior density is proportional to the product of the condi- 
tional likelihood of the data and the prior density for 0: 


g (0| yi yo. уп) X LOM, yo... Yn 10) x g(0). 


where the proportionally constant is chosen so that the integral of the posterior density 
function is 1. We illustrate by reconsidering Example 16.1. 


EXAMPLE 16.9 


Let Yı, Yo,..., Y, denote a random sample from a Bernoulli distribution where 
P(Y; = 1) = p and P(Y; = 0) = 1 — p and assume that the prior distribution for p 
is beta (о, В). Find the posterior distribution for p. 
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As before, 


L(yi, yz. Уп] PSP) = р(у1, Y2» ---, Yn | Р)8(р) 


| IFTE LL | 
= p-r(] n—» yi «—1(| В | 
PO cM Fore? s 


g'( yi у, -- -> Y» p) X PYTT — py Lyte 


From the above, we recognize that the resultant posterior for p must be beta with 
parameters o* = Y, y; +a and B* = n — Уу; + £. [1 


What was the advantage of finding the previous posterior using this “proportion- 
ality” argument? Considerably less work! Disadvantage? We never exhibited the 
predictive mass function for the data and lost the opportunity to critique the Bayesian 
model. 

Priors other than conjugate priors could well be more appropriate in specific ap- 
plications. The posterior is found using the same procedure given in Section 16.2, but 
we might obtain a posterior distribution with which we are unfamiliar. Finding the 
mean of the posterior, credible intervals, and the probabilities of relevant hypotheses 
could be more problematic. For the examples in the previous sections, we obtained 
posteriors with which we were well acquainted. Posterior means were easy to find be- 
cause we had already determined properties of normal, beta- and gamma-distributed 
random variables. Additionally, tables for these posteriors were readily available (in 
the appendix or easily accessed with many software packages). There is an ever- 
emerging set of computer procedures in which the posterior is determined based on 
user input of the likelihood function for the data and the prior for the parameter. 
Once the posterior is obtained via use of the software, this posterior is used exactly 
as previously described. 

Bayes estimators can be evaluated using classical frequentist criteria. We have 
already seen that Bayes estimators are biased. However, they are usually consistent 
and, depending on the criteria used, can be superior to the corresponding frequentist 
estimators. In Exercise 16.8, you determined that the MSE of the Bayes estimator 
was sometimes smaller than the MSE of the unbiased MLE. Further, the influence of 
the choice of the prior parameter values decreases as the size of the sample increases. 

In Example 8.11, we determined that the realized frequentist confidence interval 
for the mean of a normally distributed population was (2926.3, 2991.7). Using the 
frequentist perspective, the true population mean is fixed but unknown. As a result, 
this realized interval either captures the true value of и or it does not. We said that 
this interval was a 9596 confidence interval because the procedure (formula) used to 
produce it yields intervals that do capture the fixed mean about 9596 of the time if 
samples of size 8 are repeatedly and independently taken and used to construct many 
intervals. If 100 samples of size 8 are taken and used to produce (different) realized 
confidence intervals, we expect approximately 95 of them to capture the parameter. 
We do not know which of the 100 intervals capture the unknown fixed mean. The 
same data was used in Example 16.5 to obtain (2946.89, 2967.57) as a 9596 credible 
interval for u, now viewed as a random variable. From the Bayesian perspective, it 
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makes full sense to state that the posterior probability is .95 that the (random) mean 
is in this (fixed) interval. 

The goodness of classical hypothesis tests is measured by o and £, the probabilities 
of type I and type II errors, respectively. If tests with о = .05 are repeatedly (using 
different, independently selected samples) implemented, then when Ho is true, Ho is 
rejected 5% of the time. If Ho is really true and 100 samples of the same size are 
independently taken, we expect to reject the (true) null hypothesis about five times. It 
makes no sense to even try to compute the probabilities of the hypotheses. From the 
Bayesian perspective, the parameter of interest is a random variable with posterior 
distribution derived by the analyst. Computing the posterior probabilities for each of 
the hypotheses is completely appropriate and is the basis for the decision in a Bayesian 
test. 

Which is the better approach, Bayesian or frequentist? It is impossible to provide 
a universal answer to this question. In some applications, the Bayesian approach will 
be superior; in others, the frequentist approach is better. 
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A1.11 Other Useful Mathematical Results 


Matrices and Matrix Algebra 


The following presentation represents a very elementary and condensed discussion 
of matrices and matrix operations. If you seek a more comprehensive introduction 
to the subject, consult the books listed in the references indicated at the end of 
Chapter 11. 

We will define a matrix as a rectangular array (arrangement) of real numbers and 
will indicate specific matrices symbolically with bold capital letters. The numbers 
in the matrix, elements, appear in specific row-column positions, all of which are 
filled. The number of rows and columns may vary from one matrix to another, so 
we conveniently describe the size of a matrix by giving its dimensions—that is, the 
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number of its rows and columns. Thus matrix A 


BU =i 
А 2 1 


possesses dimensions 2 х 3 because it contains two rows and three columns. Simi- 
larly, for 


1 
—3 2 0 
= d = 
Pr 0 il n: Е i] 
7 


the dimensions of B and C are 4 x 1 and 2 x 2, respectively. Note that the row 
dimension always appears first and that the dimensions may be written below the 
identifying symbol of the matrix as indicated for matrices A, B, and С. 

As in ordinary algebra, an element of a matrix may be indicated by a symbol, 
a, b,..., and its row-column position identified by means of a double subscript. 
Thus a», would be the element in the second row, first column. Rows are numbered 
in order from top to bottom and columns from left to right. In matrix A, a», = 4, 
ауз = —1, and so on. 

Elements in a particular row are identified by their column subscript and hence 
are numbered from left to right. The first element in a row is on the left. Likewise, 
elements in a particular column are identified by their row subscript and therefore are 
identified from the top element in the column to the bottom. For example, the first 
element in column 2 of matrix A is 0, the second is 2. The first, second, and third 
elements of row 1 are 6, 0, and — 1, respectively. 

The term matrix algebra involves, as the name implies, an algebra dealing with 
matrices, much as the ordinary algebra deals with real numbers or symbols represent- 
ing real numbers. Hence, we will wish to state rules for the addition and multiplication 
of matrices as well as to define other elements of an algebra. In so doing we will point 
out the similarities as well as the dissimilarities between matrix and ordinary algebra. 
Finally, we will use our matrix operations to state and solve a very simple matrix 
equation. 'This, as you may suspect, will be the solution that we desire for the least 
squares equations. 


Addition of Matrices 


Two matrices, say A and B, can be added only if they are of the same dimensions. The 
sum of the two matrices will be a matrix obtained by adding corresponding elements 
of matrices А and B—that is, elements in corresponding positions. This being the 
case, the resulting sum will be a matrix of the same dimensions as A and B. 


EXAMPLE A1.1 


Find the indicated sum of matrices А and B: 
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Solution 2 1 4 0 -1 1 
Аве 1 6 alele -3 3 
| @4+0) (ü-D 4—-D| |20 5 
~1(-1+6) (6-3) (0+2) | 3 2) ш 
EXAMPLE A1.2 Find the sum of the matrices 
1 0 3 4 2 —1 
А =|1 -l 4 апа В = | 1 0 б 
3x3 2 =i] 0 3x3 1 4 
Solution 5 2 2 
А+В= |2 —1 10 L1 
5 0 4 


Note that (A + B) = (B + A), as in ordinary algebra, and remember that we never 
add matrices of unlike dimensions. 


A1.3 Multiplication of a Matrix 
by a Real Number 


We desire a rule for multiplying a matrix by a real number, for example, 3A, where 


2 1 
А = 4 6 
=] 0 


Certainly we would want 3A to equal (A+ A+ A), to conform with the addition rule. 
Hence, 3A would mean that each element in the A matrix must be multiplied by the 
multiplier 3, and 


302) 3(1) 6 3 
ЗА = 34) 3(6) | = 12 18 
3X-D 3(0) -3 0 


In general, given a real number c and a matrix A with elements a;j, the product cA 
will be a matrix whose elements are equal to са. 


A1.4 Matrix Multiplication 


The rule for matrix multiplication requires “row-column multiplication,” which we 
will define subsequently. The procedure may seem a bit complicated to the novice 
but should not prove too difficult after practice. We will illustrate with an example. 
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Let A and B be 


=] s[i] 


An element in the ith row and jth column of the product AB is obtained by mul- 
tiplying the ith row of A by the jth column of B. Thus the element in the first row, 
first column of AB is obtained by multiplying the first row of A by the first column 
of B. Likewise, the element in the first row, second column would be the product of 
the first row of A and the second column of B. Notice that we always use the rows of 
A and the columns of B, where A is the matrix to the left of B in the product AB. 

Row-column multiplication is relatively easy. Obtain the products, first-row ele- 
ment by first-column element, second-row element by second-column element, third 
by third, and so on, and then sum. Remember that row and column elements are 
marked from left to right and top to bottom, respectively. 

Applying these rules to our example, we obtain 


sa E JE J-l i 

2x2 2x2 — 1 14 
The first-row-first-column product would be (2) (5) + (0)(—1) = 10, which is located 
(and circled) in the first row, first column of AB. Likewise, the element in the first row, 
second column is equal to the product of the first row of A and the second column of B, 
ог (2)(2) + (0)(3) = 4. The second-row-first-column product is (1) (5) + (4)(—1) = 1 
and is located in the second row, first column of AB. Finally, the second-row-second- 
column product is (1)(2) + (4)(3) = 14. 


EXAMPLE A1.3 


Solution 


Find the products AB and BA, where 


2 1 
А=|1 —1 апа s=[; i E 
0 4 
2 1 10 -2 0 
А в=|1 =í E g ES 2 -1 -3 
3x2 23 0 4 8 0 8 
апа 
2 1 
4 -1 —1 7 1 
В, azl 0 J { E [| fal m 


Note that in matrix algebra, unlike ordinary algebra, AB does not equal BA. Be- 
cause A contains three rows and B contains three columns, we can form (3)(3) = 9 
row-column combinations and hence nine elements for AB. In contrast, B contains 
only two rows, A two columns, and hence the product BA will possess only (2)(2) = 4 
elements, corresponding to the four different row-column combinations. 

Furthermore, we observe that row-column multiplication is predicated on the as- 
sumption that the rows of the matrix on the left contain the same number of elements 
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as the columns of the matrix on the right, so that corresponding elements will exist 
for the row-column multiplication. What do we do when this condition is not satis- 
fied? We agree never to multiply two matrices, say AB, where the rows of A and the 
columns of B contain an unequal number of elements. 

An examination of the dimensions of the matrices will tell whether they can be 
multiplied as well as give the dimensions of the product. Writing the dimensions 
underneath the two matrices, 

A В —AB 
mxp pxq mxq 
we observe that the inner two numbers, giving the number of elements in a row of A 
and column of B, respectively, must be equal. The outer two numbers, indicating the 
number of rows of A and columns of B, give the dimensions of the product matrix. 
You may verify the operation of this rule for Example A1.3. 


EXAMPLE A1.4 


Obtain the product AB: 


2 0 
A B-I2 1 0]| 0 3|-I[4 3] 
1х3: 3x2 —1 0 


Note that product AB is (1 x 2) and that BA is undefined because of the respective 
dimensions of A and B. 


EXAMPLE A1.5 


Solution 


Find the product AB, where 


1 
2 
A=[1 2 3 4] ad В= |5 
4 
1 
2 
A В=[1 2 3 4] — [30]. 
1x4 4х1 3 
4 


Note that this example produces a different method for writing a sum of squares. Ё 


A1.5 


Identity Elements 


The identity elements for addition and multiplication in ordinary algebra are О and 
1, respectively. In addition, 0 plus any other element, say a, is identically equal to a; 
that is, 


04-222, 0+ (—9) = —9. 
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Similarly, the multiplication of the identity element 1 by any other element, say a, is 
equal to a; that is, 


(1)(5) = 5, (1)(—4) = —4. 


In matrix algebra two matrices are said to be equal when all corresponding elements 
are equal. With this in mind we will define the identity matrices in a manner similar 
to that employed in ordinary algebra. Hence, if A is any matrix, a matrix B will be an 
identity matrix for addition if 


A+B=A and B+A=A. 


It easily can be seen that the identity matrix for addition is one in which every element 
is equal to zero. This matrix is of interest but of no practical importance in our work. 

Similarly, if A is any matrix, the identity matrix for multiplication is a matrix I 
that satisfies the relation 


AI=A and IA — A. 


This matrix, called the identity matrix, is the square matrix 


1000... 0 
0 1 0 0 - 0 
0 0 1 0 : 0 
I=|0 00 1 : 0 
000 0... I 


That is, all elements in the main diagonal of the matrix, running from top left to 
bottom right, are equal to 1; all other elements equal zero. Note that the identity 
matrix is always indicated by the symbol I. 

Unlike ordinary algebra, which contains only one identity element for multipli- 
cation, matrix algebra must contain an infinitely large number of identity matrices. 
Thus we must have matrices with dimensions | x 1, 2 x 2,3 x 3, 4 x 4, and so on, so 
as to provide an identity of the correct dimensions to permit multiplication. All will 
be of this pattern. 

That the I matrix satisfies the relation 


IA = АТ=А 


can be shown by an example. 


EXAMPLE A1.6 


Let 


A 


Il 
Las 
[e 
wo 

ue 


Show that IA = A and АТ = A. 


Solution 


A1.6 


DEFINITION A1.1 


A1.6 The Inverse of a Matrix 827 


10][ 210 210 
EE LS 6 ЕЕ 6 j^^ 


100 
А pers es] 0 10 eee. B 
2x3 3x3 0 0 1 


and 


The Inverse of a Matrix 


For matrix algebra to be useful, we must be able to construct and solve matrix equations 
for a matrix of unknowns in a manner similar to that employed in ordinary algebra. 
This, in turn, requires a method of performing division. 

For example, we would solve the simple equation in ordinary algebra, 


2x =6 


by dividing both sides of the equation by 2 and obtaining x — 3. Another way to view 
this operation is to define the reciprocal of each element in an algebraic system and 
to think of division as multiplication by the reciprocal of an element. We could solve 
the equation 2x — 6 by multiplying both sides of the equation by the reciprocal of 
2. Because every element in the real number system possesses a reciprocal, with the 
exception of 0, the multiplication operation eliminates the need for division. 

The reciprocal of a number c in ordinary algebra is a number b that satisfies the 
relation 


ch=1 


that is, the product of a number by its reciprocal must equal the identity element for 
multiplication. For example, the reciprocal of 2 is 1/2 and (2)(1/2) = 1. 

A reciprocal in matrix algebra is called the inverse of a matrix and is defined as 
follows: 


Let A, ,, be a square matrix. If a matrix А! сап be found such that 
AA =1 an АТА =І 


then A- is called the inverse of A. 


Note that the requirement for an inverse in matrix algebra is the same as in ordinary 
algebra—that is, the product of A by its inverse must equal the identity matrix for 
multiplication. Furthermore, the inverse is undefined for nonsquare matrices, and 
hence many matrices in matrix algebra do not have inverses (recall that 0 was the only 
element in the real number system without an inverse). Finally, we state without proof 
that many square matrices do not possess inverses. Those that do will be identified in 
Section A1.9, and a method will be given for finding the inverse of a matrix. 
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The Transpose of a Matrix 


We have just discussed a relationship between a matrix and its inverse. A second 
useful matrix relationship defines the transpose of a matrix. 


Let Apxq be a matrix of dimensions p x q. Then A’, called the transpose of 
A, is defined to be a matrix obtained by interchanging corresponding rows and 
columns of А; that is, first with first, second with second, and so on. 


For example, let 


Then 


Note that the first and second rows of A’ are identical with the first and second 
columns, respectively, of A. 
As a second example, let 


Уз 


Then Y' = [yi y» уз]. As a point of interest, we observe that Y'Y = УЖ de y. 
Finally, if 


2 1 4 
А=|0 2 3 
1 6 9 
then 
2 0 1 
А'=|1 2 6 
4 3 9 


A Matrix Expression for a System 
of Simultaneous Linear Equations 


We will now introduce you to one of the very simple and important applications of 
matrix algebra. Let 


201 + у = 5 


ур = у = 1 
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be a pair of simultaneous linear equations in the two variables, v; and уз. We will 
then define three matrices: 


ad 2 1 = | YI __ 5 
^-[ E =) s-[i] 


Note that А is the matrix of coefficients of the unknowns when the equations are 
each written with the variables appearing in the same order, reading left to right, and 
with the constants on the right-hand side of the equality sign. The V matrix gives the 
unknowns in a column and in the same order as they appear in the equations. Finally, 
the G matrix contains the constants in a column exactly as they occur in the set of 
equations. 

The simultaneous system of two linear equations may now be written in matrix 
notation as 


АУ = С 


а statement that can easily be verified by multiplying A апа У and then comparing 
the answer with G. 


-[ JEES- E]. 


Observe that corresponding elements in AV and G are equal—that is, 2v; + v2 = 5 
and vı — ух = 1. Therefore, AV = С. 

The method for writing a pair of linear equations in two unknowns as a matrix 
equation can easily be extended to a system of r equations in r unknowns. For example, 
if the equations are 


ату + djoV2 + a13V3 + +++ + ау = 81 
азу + d22V2 + a23V3 +--+ + Axr Vr = 82 


азу + 432V2 + d33V3 + +++ + азу; = 83 


ау + а,2у2 + 4-3V3 + + ayuu v, = &r 


define 
а 012. 443 ат, Vi $1 
азу 92 A23 Ar v2 82 
А = | 931 430 азз аз; v=] 3 С = | 83 
а а аз ак У, 8, 


Observe that, once again, А is a square matrix of variable coefficients, whereas У 
and G are column matrices containing the variables and constants, respectively. Then 
AV = С. 

Regardless of how large the system of equations, if we possess п linear equations 
їп п unknowns, the system may be written as the simple matrix equation AV = С. 

You will observe that the matrix V contains all the unknowns, whereas A and G 
are constant matrices. 
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Our objective, of course, is to solve for the matrix of unknowns, V, where the 
equation AV — G is similar to the equation 


2v —6 


in ordinary algebra. This being true, we would not be too surprised to find that the 
methods of solutions are the same. In ordinary algebra both sides of the equation 
are multiplied by the reciprocal of 2; in matrix algebra both sides of the equation are 
multiplied by A~!. Then 


A (АУ) = АС̧ 
or 
АТАУ = АС̧. 


But A~'A = I and ГУ = У. Therefore, У = A~'G. In other words, the solutions to 
the system of simultaneous linear equations can be obtained by finding A^ and then 
obtaining the product A-!G. The solutions values of v1, v2, v3,..., v, will appear in 
sequence in the column matrix У = A^!G. 


Inverting a Matrix 


We have indicated in Section A1.8 that the key to the solutions of a system of simul- 
taneous linear equations by the method of matrix algebra rests on the acquisition of 
the inverse of the A matrix. Many methods exist for inverting matrices. The method 
that we present is not the best from a computational point of view, but it works very 
well for the matrices associated with most experimental designs and it is one of the 
easiest to present to the novice. It depends upon a theorem in matrix algebra and the 
use of row operations. 

Before defining row operations on matrices, we must state what is meant by the 
addition of two rows of a matrix and the multiplication of a row by a constant. We 
will illustrate with the A matrix for the system of two simultaneous linear equations, 


ier 


Two rows of a matrix may be added by adding corresponding elements. Thus if 
the two rows of the A matrix are added, one obtains a new row with elements [(2 + 1) 
(1— 1)] = [3 0]. Multiplication of a row by a constant means that each element in the 
row is multiplied by the constant. Twice the first row of the А matrix would generate 
the row [4 2]. With these ideas in mind, we will define three ways to operate on a 
row in a matrix: 


1. A row may be multiplied by a constant. 


2. Arow may be multiplied by a constant and added to or subtracted from another 
row (which is identified as the one upon which the operation is performed). 


3. Two rows may be interchanged. 


Given matrix А, it is quite easy to see that we might perform a series of row 
operations that would yield some new matrix B. In this connection we state without 
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proof a surprising and interesting theorem from matrix algebra; namely, there exists 
some matrix C such that 


CA = B. 


In other words, a series of row operations on a matrix А is equivalent to multiplying 
A by a matrix C. We will use this principle to invert a matrix. 
Place the matrix A, which is to be inverted, alongside an identity matrix of the 


same dimensions: 
2 1 1 0 
^=[ a] la i] 


Then perform the same row operations оп А and I in such a way that A changes 
to an identity matrix. In doing so, we must have multiplied A by a matrix C so that 
CA — I. Therefore, C must be the inverse of A! The problem, of course, is to find the 
unknown matrix C and, fortunately, this proves to be of little difficulty. Because we 
performed the same row operations on A and I, the identity matrix must have changed 


to CI C =A“, 
2 1 10 
х= [| El т= [5 1 


| (same row operations) | 
СА —I СТ = С = А”! 


We will illustrate with the following example. 


EXAMPLE A1.7 


Solution 


Invert the matrix 


ЖЕК ЖО 


Step 1. Operate on row 1 by multiplying row 1 by 1/2. (Note: It is helpful to the 
beginner to identify the row upon which he or she is operating because all other rows 
will remain unchanged, even though they may be used in the operation. We will star 
the row upon which the operation is being performed.) 


*|1 1/2 1/2 0 
1 =] 0 If 
Step 2. Operate on row 2 by subtracting row 1 from row 2. 
1 1/2 1/2 0 
*[0 —3/2 —1/2 17) 


(Note that row 2 is simply used to operate on row | and hence remains unchanged.) 
Step 3. Multiply row 2 by (—2/3). 
1/2 0 
1/3 —2/3 |" 


1 1/2 
*|0 1 
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Step 4. Operate on row 1 by multiplying row 2 by 1/2 and subtracting from row 1. 


*! 1 0 1/3 1/3 
0 1 1/3 —2/3" 
(Note that row 2 is simply used to operate on row 1 and hence remains unchanged.) 
Hence the inverse of A must be 


a [13 1/3 
" ег. к | 


A ready check on the calculations for the inversion procedure is available because 
АС! А must equal the identity matrix I. Thus 


ia |1/3 1/3 || 2 1| |1 0 
А бе|. —2/3 1 —1| |0 17 ы 
EXAMPLE A1.8 Invert the matrix 
2 0 1 
A=]1 —1 2 
1 0 


and check the results. 


Solution p) 0 1 100 
А= |1 -1 2 1= 1010 
1 0 0 0 1 


Step 1. Multiply row 1 by 1/2. 


«| 1 0 1/2 1/2 0 0 
1 -1 2 0 1 0 
1 0 0 0 0 1 
Step 2. Operate on row 2 by subtracting row 1 from row 2. 
1 0 1/2 1/2 0 0 
*! 0 -1 3/2 —1/2 1 0 |. 
1 0 0 0 0 1 
Step 3. Operate on row 3 by subtracting row 1 from row 3. 
1 0 1/2 1/2 0 0 
0 —1 3/2 —1/2 1 0 
«| 0 0 —1/2 —1/2 0 1 
Step 4. Operate on row 2 by multiplying row 3 by 3 and adding to row 2. 
1 0 1/2 1/2 0 0 
*|0 —1 0 —2 1 3 
0 0 —1/2 —1/2 0 1 
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Step 5. Multiply row 2 by (— 1). 


1 0 1/2 1/2 0 0 
*|0 1 0 2 —1 -3 
0 0 —1/2 —1/2 0 1 
Step 6. Operate on row | by adding row 3 to row 1. 
*|1 0 0 0 0 1 
0 1 0 2 —1 -3 
0 0 —1/2 —1/2 0 1 
Step 7. Multiply row 3 by (—2). 
1 0 0 0 0 1 
0 1 0 2 -1 -3 | =A". 
*|0 0 1 1 0 —2 


The seven row operations have changed the A matrix to the identity matrix and, 
barring errors of calculation, have changed the identity to A^. 
Checking, we have 


0 0 1]f2 0 1 1 0 0 
АТА = |2 -1 -3 1-1 2/=/0 1 0 
1 0-2||1 0 0 00 1 


We see that A^! A = І and hence that the calculations are correct. E 


Note that the sequence of row operations required to convert A to I is not unique. 
One person might achieve the inverse by using five row operations whereas another 
might require ten, but the end result will be the same. However, in the interests of 
efficiency it is desirable to employ a system. 

Observe that the inversion process utilizes row operations to change off-diagonal 
elements in the A matrix to Os and the main diagonal elements to 1s. One systematic 
procedure is as follows. Change the top left element into a 1 and then perform row 
operations to change all other elements in the first column to 0. Then move to the 
diagonal element in the second row, second column, change it into a 1, and change all 
elements in the second column below the main diagonal to 0. This process is repeated, 
moving down the main diagonal from top left to bottom right, until all elements below 
the main diagonal have been changed to Os. To eliminate nonzero elements above the 
main diagonal, operate on all elements in the last column, changing each to 0; then 
move to the next to last column and repeat the process. Continue this procedure until 
you arrive at the first element in the first column, which was the starting point. This 
procedure is indicated diagrammatically in Figure A1.1. 

Matrix inversion is a tedious process, at best, and requires every bit as much labor 
as the solutions of a system of simultaneous equations by elimination or substitution. 
You will be pleased to learn that we do not expect you to develop a facility for matrix 
inversion. Fortunately, most matrices associated with designed experiments follow 
patterns and are easily inverted. 
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FIGURE А1.1 
Procedure for 
matrix inversion 
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ри 
I4 


It will be beneficial to you to invert a few 2 x 2 and 3 x 3 matrices. Matrices lacking 
pattern, particularly large matrices, are inverted most efficiently and economically by 
using a computer. (Programs for matrix inversion have been developed for most 
computers.) 

We emphasize that obtaining the solutions for the least squares equations 
(Chapter 11) by matrix inversion has distinct advantages that may or may not be 
apparent. Not the least of these is the fact that the inversion procedure is systematic 
and hence is particularly suitable for electronic computation. However, the major 
advantage is that the inversion procedure will automatically produce the variances of 
the estimators of all parameters in the linear model. 

Before leaving the topic of matrix inversion, we ask how one may identify a matrix 
that has an inverse. Reference to a discussion of linear equations in ordinary algebra 
should reveal the answer. 

Clearly, a unique solutions for a system of simultaneous linear equations cannot 
be obtained unless the equations are independent. Thus if one of the equations is a 
linear combination of the others, the equations are dependent. Coefficient matrices 
associated with dependent systems of linear equations do not possess an inverse. 


Solving a System of Simultaneous 
Linear Equations 


We have finally obtained all the ingredients necessary for solving a system of simul- 
taneous linear equations, 


201 + у = 5 


ур = у = 1 
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Recalling that the matrix solutions to the system of equations AV = С is V = A^!G, 


we obtain 
DN BU 1/3 |151 |2 
усваи 2 e 


Hence the solutions is 
„| Ут |. 2 
те 


that is, v; = 2 and v; = 1, a fact that may be verified by substitution of these values 
in the original linear equations. 


EXAMPLE A1.9 


Solution 


Solve the system of simultaneous linear equations 


2v, + уз = 4 
ур — v2 + 213 = 2 


ү = 1. 
The coefficient matrix for these equations, 
2 0 1 
А= |1 —1 2 
1 0 0 


0 0 1 
А1=|2 -1 -3 
1 0 —2 
Solving, we obtain 
0 0 1 4 1 
V=A'G=] 2 1 3 2. |;2-.3 
1 0 —2 1 2 
Thus v; = 1, v? = 3 and уз = 2 give the solutions to the set of three simultaneous 
linear equations. L| 


A1.11 


Other Useful Mathematical Results 


The purpose of this section is to provide the reader with a convenient reference 
to some of the key mathematical results that are used frequently in the body of the 
text. 
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The Binomial Expansion of (x + y)" Let x and y be any real numbers, then 


n n n, 0 (") n—l,l (7) n—-2.2 Е ") 0. л 
(x + y) = (Dr ПЕ ОЕА «C xy 


= E yt tyr 
1 


i=0 

The Sum of a Geometric Series Let r be a real number such that |r| < 1, and m be 
any integer m > 1 

т+1 


і = iZ T TEE ai 
ат re ее 


i=0 і=1 i=0 


The (Taylor) Series Expansion of е“ Let x be any real number, then 


oo i 


i=0 
Some useful formulas for particular summations follow. The proofs (omitted) are 
most easily established by using mathematical induction. 


+ mat 1) 


^A | n(n+ l)n + 1) 
6 


n Fr n(n + 1)\2 
5 2 | 


Gamma Function Let t > 0, then T (t) is defined by the following integral: 


оо 
I(t) = f y le^ dy. 
Using the technique of integration by M it follows that for any t > 0 
lit+1)=tl(t) 
and if t = n, where n is an integer, 
Г(п) = (n — 1)!. 
Further, 
Г(1/2) = Jm. 
Ifa, В > 0, the Beta function, В(а, В), is defined by the following integral, 


1 
ва, fy | уа уау 
0 
апа is related to the gamma function as follows: 


rœ) 


ныр ы т 


APPENDIX 2 


Common Probability 
Distributions, Means, 
Variances, and 
Moment-Generating 
Functions 


Table 1 Discrete Distributions 


Moment- 
Generating 
Distribution Probability Function Mean Variance Function 
Binomial р(у) = о) PA- p" пр np(l— p) [pe' + (1 — p) 
YH 0,1) un 
Geometri (у) = p — py"! l 1—р pe 
eometric у)= =p) = =F тя mbi 
PY р L р р? 1—(1—р)е! 
325152, $5 
N- 
ТРИЕ tyres y п-у ) nr " ( a ) N -r N-n does not exist 
yperg PY) = ~ ? N N N N-—1 in closed form 
y=0,1,...,nifn <r, 
y 20; 1.55 rifn >r 
Poisson py) = : = А А exp[A(e' — 1)] 
y! 
y=0,1,2,... 
TTE js og = r rd — p) pe r 
Negative binomial )eQu)püppy — PEE RENC OS 
g py) [up (1 — p) р р? 1 — (1 — p)e 
y=rnrt+l,... 
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Table 2 Continuous Distributions 


Moment- 
Generating 
Distribution Probability Function Mean Variance Function 
0 0 0,—0 2 105 _ „101 
Uniform ҒО) = i <у<6 1+0 @ — ay EE 
0; — Ө, 2 12 t (0 — 01) 
1 2 А 20? 
Normal (У) = ех | ( K | g“ ex ( t + : 
fO aan 222) 0 - 9 ш р{ и 
—00 < y < +оо 
Е l у 
Exponential РО) = В XB B-10 В В? (1 — pr)! 
О<у<оо 
Сатта fo) = | : | КЕГЕ of of? (1— fry* 
` Г(о)8° 
О<у<оо 
| (у)®/Э-—1е—>У/? 2/2 
Chi-square f) = PATO) ' v 2v (1 = 2t) Е 
у> 0 
r X 
Beta Р) = Tete) ye) — у)#-!; а ap does not exist in 
Г(о)Г(д) a+B («+ В)(а0+8+1) closed form 


О<у<1 
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Tables 


Table 1 Binomial Probabilities 


Tabulated values are P(Y < a) = Y; p(y). (Computations are rounded at third decimal place.) 
(ajn=5 y=0 


Р 


a 0.01 0.05 0.10 0.20 0.30 0.40 050 060 0.70 0.80 0.90 0.95 0.99 a 


0 951 .774 590  .328 168 078  .031 .010 002 .000 .000 .000 .000 0 
1 999 977. 919 737 .528 337 188 .087 .031 007 .000 .000 .000 1 
2 1.000 999 91  .942 .837 683 .500 317 .163 058 .009 .001 .000 2 
3 1.000 1.000 1.000 .993 .969  .913 .812 .663 472 263 081 023 .001 3 
4 1.000 1.000 1.000 1.000 998  .990 .969 .922 .832 .672 410 226 .049 4 

(b) n = 10 

Р 

a 0.01 005 010 020 0.30 0.40 050 0.60 0.70 0.80 090 0.95 0.99 a 
0 904 599 349 107 028  .006 001 .000 .000 .000 .000 .000 .000 0 
1 996 914 736 376  .149 .046 011 .002 .000 .000 .000 .000 .000 1 
2 1.000 .988 .930  .678  .383 167 055 .012 .002 .000 .000 .000 .000 2 
3 1000 999 987  .879 .650 .382 172 .055 .011 .001 .000 .000 .000 3 
4 1.000 1.000 .998 .967  .850  .633 .377 166 047 .006 .000 .000 .000 4 
5 1.000 1.000 1.000 .994  .953 .834 623 367 150 033 .002 .000 .000 5 
6 1.000 1.000 1.000 .999  .989 .945 .828 .618 .350 121 .013 .001 .000 6 
7 1.000 1.000 1.000 1.000 .998 .988 .945 .833 617 322 .070 .012 .000 7 
8 1.000 1.000 1.000 1.000 1.000 .998 .989 .954 .851 .624 264 .086 .004 8 
9 1.000 1.000 1.000 1.000 1.000 1.000 .999 .994 .972 .893 .651 401 .096 9 


839 


840 Appendix 3 


Table 1 (Continued ) 
(c) n = 15 


Tables 


p 
a 0.01 0.05 0.10 0.20 0.30 0.40 0.50 0.60 070 0.80 0.90 0.95 099 aq 
0 .860 .463 .206 .035 .005 .000 .000 .000 .000 .000 .000 4.000 .000 0 
1 990 .829 .549 .167 .035 .005 .000 .000 .000 .000 ..000 4.000  .000 1 
2 1.000 .964 .816 .398 .127 .027 .004 000 .000 .000 4.000 ..000  .000 2 
3 1.000 995 ‚944 ‚648 .297 .091 018 .002 .000 .000 000 .000 .000 3 
4 1.000 .999 .987 .836 215 .217 .059 009 .001 4.000 ..000 .000  .000 d 
5 1.000 1.000 .998 .939 722 403 .151 .034 004  .000 4.000 4.000 .000 5 
6 1.000 1.000 1.000 .982 .869 .610 .304 095 .015 .001 4.000 .000  .000 6 
7 1.000 1.000 1.000 .996 .950 ‚787 500 213 050 .004 .000 4.000 .000 7 
8 1.000 1.000 1.000 .999 .985 .905 .696 390 .131 4.018 4.000 .000 .000 8 
9 1.000 1.000 1.000 1.000 .996 .966 .849 397  .278 4.061 4.002 .000 .000 9 
10 1.000 1.000 1.000 1.000 999 ‚991 941 .783 485 164 .013 .001 4.000 10 
11 1.000 1.000 1.000 1.000 1.000 998 ‚982 909 .703 352 .056 .005 .000 11 
12 1.000 1.000 1.000 1.000 1.000 1.000 .996 .973  .873  .602 184 .036 .000 12 
13 1.000 1.000 1.000 1.000 1.000 1.000 1.000 995 965 .833 451 171 .010 13 
14 1.000 1.000 1.000 1000 1.000 1.000 1.000 1000 .995 ..965 794 ..537 .140 14 
(d) n = 20 
p 
a 0.01 0.05 0.10 0.20 0.30 0.40 0.50 0.60 070 0.80 0.90 0.95 099 a 
0 .818 .358 .122 .012 .001 .000 .000 .000 .000 .000 .000 4.000  .000 0 
1 .983 736 .392 069 ‚008 001 000 .000 .000 .000 .000 .000  .000 1 
2 .999 ‚925 .677 .206 035 ‚004 000 .000 .000 .000 .000 .000  .000 2 
3 1.000 984 ‚867 411 .107 .016 .001 000 .000  .000 4.000 .000  .000 3 
4 1.000 .997 .957 .630 238 ‚051 006 000 .000 .000 000 .000  .000 4 
5 1.000 1.000 .989 .804 .416 .126 .021 .002 .000 .000 .000 4.000  .000 5 
6 1.000 1.000 .998 913 .608 .250 .058 006 .000 .000 .000 .000 .000 6 
7 1.000 1.000 1.000 .968 12 416 ‚132 .021 .001 .000 .000 .000 .000 7 
8 1.000 1.000 1.000 .990 .887 .596 252, 057 .005 .000 4.000 .000 .000 8 
9 1.000 1.000 1.000 ‚997 .952 .155 412 128 .017 .001 .000 .000 .000 9 
10 1.000 1.000 1.000 999 983 ‚872 588 245 048 .003 .000 .000 .000 10 
11 1.000 1.000 1.000 1.000 .995 .943 .748 404 .113 .010 000 .000 4.000 11 
12 1.000 1.000 1.000 1.000 .999 ‚979 ‚868 584 .228 .032 .000 .000 .000 12 
13 1.000 1.000 1.000 1.000 1.000 .994 .942 750  .392 .087 4.002 .000 .000 13 
14 1.000 1.000 1.000 1.000 1.000 .998 .979 .874  .584 196 .011 .000 .000 14 
15 1.000 1.000 1.000 1.000 1.000 1.000 ‚994 949 762 370 .043 .003 .000 15 
16 1.000 1.000 1.000 1.000 1.000 1.000 .999 984  .803 .589 .133 .016 .000 16 
17 1.000 1.000 1.000 1.000 1.000 1.000 1.000 996 965 794 .323 .075 .001 17 
18 1.000 1.000 1.000 1.000 1.000 1.000 1.000 999 992 931 608 .264 .017 18 
19 1.000 1.000 1.000 1.000 1000 1.000 1.000 1.000 .999 .988 .878 .642 182 19 


Tables 841 
Table 1 (Continued ) 
(e) n = 25 
P 

а 0.01 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 095 099 a 
0 778 277 ‚072. ‚004 ‚000 ‚000 000 000 .000 .000 .000 .000 .000 0 
1 974 ‚642. 271 .027 .002 .000 .000 .000 .000 .000 ..000 .000  .000 1 
2 998 873 337 .098 .009 .000 .000 .000 .000 .000 ..000 4.000  .000 2 
3 1.000 .966 ‚764 ‚234 ‚033 ‚002 000 000 .000 .000 .000 .000 .000 3 
4 1.000 .993 .902 421 090 ‚009 000 000 .000 .000 .000 .000  .000 4 
5 1.000 .999 .967 .617 .193 .029 .002 .000 .000 .000 ..000 4.000  .000 5 
6 1.000 1.000 ‚991 ‚780 341 .074 .007 .000 .000 .000 .000 4.000  .000 6 
7 1.000 1.000 .998 .891 :912 .154 .022 .001 .000 .000 .000 .000  .000 7 
8 1.000 1.000 1.000 ‚953 ‚677 ‚274 054 ‚004 .000 .000 .000 .000  .000 8 
9 1.000 1.000 1.000 ‚983 811 425 115 ‚013 .000 .000 .000 .000  .000 9 
10 1.000 1.000 1.000 .994 .902 .586 212 .034 .002 .000 .000 .000 .000 10 
11 1.000 1.000 1.000 .998 .956 .732 345 078 006.000 .000 .000 .000 11 
12 1.000 1.000 1.000 1.000 .983 .846 .500 .154 017 .000 .000 .000 .000 12 
13 1.000 1.000 1.000 1.000 .994 .922 .655 .268 044 002 .000 4.000 .000 13 
14 1.000 1.000 1.000 1.000 .998 .966 ‚788 414 098 .006 .000 .000 .000 14 
15 1.000 1.000 1.000 1.000 1.000 .987 .885 575 189 .017 .000 .000 .000 15 
16 1.000 1.000 1.000 1.000 1.000 .996 .946 .726 323 047 .000 .000 .000 16 
17 1.000 1.000 1.000 1.000 1.000 .999 .978 .846 A488 109 .002 .000 4.000 17 
18 1.000 1.000 1.000 1.000 1.000 1.000 .993 .926 .659 220 .009 ..000 4.000 18 
19 1.000 1.000 1.000 1.000 1.000 1.000 .998 971 8407.383 .033 .001 .000 19 
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 .991 910 579 .098 .007 .000 20 
2] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 .998 967  .766 .236 4.034 .000 21 
22 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 991 902 .463 .127 .002 22 
23 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 998 .973 .729 .358 .026 23 
24 1.000 1.000 1000 1.000 1.000 1.000 1.000 1.000 1.000 .996 .928 723 .222 24 
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Table 2 Table of e~* 


x 


=% 


e 


—X 


=% 


—x 


е 


0.00 
0.10 
0.20 
0.30 
0.40 
0.50 
0.60 
0.70 
0.80 
0.90 
1.00 
1.10 
1.20 
1.30 
1.40 
1.50 
1.60 
1.70 
1.80 
1.90 
2.00 
2.10 
2.20 
2.30 
2.40 
2.50 


1.000000 
.904837 
818731 
‚740818 
‚670320 
‚606531 
548812 
496585 
449329 
406570 
367879 
332871 
301194 
.272532 
.246597 
.223130 
.201897 
.182684 
.165299 
.149569 
.135335 
.122456 
.110803 
.100259 
.090718 
.082085 


‚074274 
‚067206 
060810 
‚055023 
‚049787 
‚045049 
‚040762 
036883 
‚033373 
‚030197 
‚027324 
‚024724 
022371 
‚020242 
018316 
‚016573 
014996 
‚013569 
‚012277 
011109 
010052 
009095 
008230 
‚007447 
006738 


006097 
‚005517 
004992 
‚004517 
‚004087 
003698 
003346 
003028 
‚002739 
‚002479 
‚002243 
‚002029 
001836 
001661 
001503 
001360 
‚001231 
001114 
001008 
000912 
‚000825 
‚000747 
000676 
000611 
‚000553 


000501 
000453 
000410 
000371 
000336 
000304 
000275 
000249 
000225 
000204 
000184 
000167 
‚000151 
000136 
000123 
000112 
000101 
000091 
000083 
‚000075 
000068 
000061 
000056 
000050 
000045 


Tables 843 
Table 3 Poisson Probabilities 
a АУ 
= ee 
PY <a=)> e S 
y=0 à 
а 
А 0 1 2 3 4 5 6 7 8 9 
0.02 0.980 1.000 
0.04 0.961 0.999 1.000 
0.06 0.942 0.998 1.000 
0.08 0.923 0.997 1.000 
0.10 0.905 0.995 1.000 
0.15 0.861 0.990 0.999 1.000 
0.20 0.819 0.982 0.999 1.000 
0.25 0.779 0.974 0.998 1.000 
0.30 0.741 0.963 0.996 1.000 
0.35 0.705 0.951 0.994 1.000 
0.40 0.670 0.938 0.992 0.999 1.000 
0.45 0.638 0.925 0.989 0.999 1.000 
0.50 0.607 0.910 0.986 0.998 1.000 
0.55 0.577 0.894 0.982 0.988 1.000 
0.60 0.549 0.878 0.977 0.997 1.000 
0.65 0.522 0.861 0.972 0.996 0.999 1.000 
0.70 0.497 0.844 0.966 0.994 0.999 1.000 
0.75 0472 0.827 0.959 0.993 0.999 1.000 
0.80 0.449 0.809 0.953 0.991 0.999 1.000 
0.85 0.427 0.791 0.945 0.989 0.998 1.000 
0.90 0.407 0.772 0.937 0.987 0.998 1.000 
0.95 0.387 0.754 0.929 0.981 0.997 1.000 
1.00 0.368 0.736 0.920 0.981 0.996 0.999 1.000 
1.1 0.333 0.699 0.900 0.974 0.995 0.999 1.000 
1.2 0.301 0.663 0.879 0.966 0.992 0.998 1.000 
1.3 0.273 0.627 0.857 0.957 0.989 0.998 1.000 
1.4 0.247 0.592 0.833 0.946 0.986 0.997 0.999 1.000 
1:5 0.223 0.558 0.809 0.934 0.981 0.996 0.999 1.000 
1.6 0.202 0.525 0.783 0.921 0.976 0.994 0.999 1.000 
17 0.183 0493 0.757 0.907 0.970 0.992 0.998 1.000 
1.8 0.165 0.463 0.731 0.891 0.964 0.990 0.997 0.999 1.000 
1:9 0.150 0.434 0.704 0.875 0.956 0.987 0.997 0.999 1.000 
2.0 0.135 0.406 0.677 0.857 0.947 0.983 0.995 0.999 1.000 
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1 2 3 4 5 6 7 8 9 

22 0111 0.355 0.623 0.819 0.928 0.975 0.993 0.998 1.000 
2.4 0.091 0.308 0.570 0.779 0.904 0.964 0.988 0.997 0.999 1.000 
2.6 0.074 0.267 0.518 0.736 0.877 0.951 0.983 0.995 0.999 1.000 
2.8 0.061 0.231 0.469 0.692 0.848 0.935 0.976 0.992 0.998 0.999 
3.0 0.050 0.199 0423 0.647 0.815 0.916 0.966 0.988 0.996 0.999 
3.2 20.041 0.171 0.380 0.603 0.781 0.895 0.955 0.983 0.994 0.998 
3.4 0.033 0.147 0.340 0.558 0.744 0.871 0.942 0.977 0.992 0.997 
3.6 0.027 0.126 0.303 0.515 0.706 0.844 0.927 0.969 0.988 0.996 
3.8 0.022 0.107 0.269 0.473 0.668 0.816 0.909 0.960 0.984 0.994 
4.0 0.018 0.092 0.238 0.433 0.629 0.785 0.889 0.949 0.979 0.992 
4.2 0.015 0.078 0.210 0.395 0.590 0753 0.867 0.936 0.972 0.989 
44 0.012 0.066 0.185 0.359 0.551 0.720 0.844 0.921 0.964 0.985 
46 0.010 0.056 0.163 0.326 0.513 0.686 0.818 0.905 0.955 0.980 
4.8 0.008 0.048 0.143 0.294 0.476 0.651 0.791 0.887 0.944 0.975 
5.0 0007 0.040 0.125 0.265 0.440 0.616 0.762 0.867 0.932 0.968 
5.2 0.006 0.034 0.109 0.238 0.406 0.581 0.732 0.845 0.918 0.960 
5.4 0.005 0.029 0.095 0.213 0.373 0.546 0.702 0.822 0.903 0.951 
5.6 0.004 0.024 0.082 0.191 0.342 0.512 0.670 0.797 0.886 0.941 
5.8 0.003 0.021 0.072 0.170 0.313 0.478 0.638 0.771 0.867 0.929 
6.0 0.002 0.017 0.062 0.151 0.285 0.446 0.606 0.744 0.847 0.916 

10 11 12 13 14 15 16 
2.8 1.000 
3.0 1.000 
3.2 1.000 
3.4 0.999 1.000 
3.6 0.999 1.000 
3.8 0.998 0.999 1.000 
4.0 0.997 0.999 1.000 
42 0.996 0.999 1.000 
44 0.994 0.998 0.999 1.000 
46 0.992 0.997 0.999 1.000 
48 0.990 0.996 0.999 1.000 
50 0.986 0.995 0.998 0.999 1.000 
5.2 0.982 0.993 0.997 0.999 1.000 
5.4 0.977 0.990 0.996 0.999 1.000 
5.6 0.972 0.988 0.995 0.998 0.999 1.000 
5.8 0.965 0.984 0.993 0.997 0.999 1.000 
6.0 0.957 0.980 0.991 0.996 0.999 0.999 1.000 
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Table 3 (Continued ) 


6.2 0.002 0.015 0.054 0.134 0.259 0.414 0.574 0.716 0.826 0.902 
6.4 0.002 0.012 0.046 0.119 0.235 0.384 0.542 0.687 0.803 0.886 
6.6 0.001 0.010 0.040 0.105 0.213 0.355 0.511 0.658 0.780 0.869 
6.8 0.001 0.009 0.034 0.093 0.192 0.327 0.480 0.628 0.755 0.850 
7.0 0.001 0.007 0.030 0.082 0.173 0.301 0.450 0.599 0.729 0.830 


7.2 0.001 0.006 0.025 0.072 0.156 0.276 0.420 0.569 0.703 0.810 
7.4 0.001 0.005 0.022 0.063 0.140 0.253 0.392 0.539 0.676 0.788 
7.6 0.001 0.004 0.019 0.055 0.125 0.231 0.365 0.510 0.648 0.765 
7.8 0.000 0.004 0.016 0.048 0.112 0.210 0.338 0.481 0.620 0.741 


8.0 0.000 0.003 0.014 0.042 0.100 0.191 0.313 0.453 0.593 0.717 
8.5 0.000 0.002 0.009 0.030 0.074 0.150 0.256 0.386 0.523 0.653 
9.0 0.000 0.001 0.006 0.021 0.055 0.116 0.207 0.324 0.456 0.587 
9.5 0.000 0.001 0.004 0.015 0.040 0.089 0.165 0.269 0.392 0.522 
10.0 0.000 0.000 0.003 0.010 0.029 0.067 0.130 0.220 0.333 0.458 


10 11 12 13 14 15 16 17 18 19 


6.2 0.949 0.975 0.989 0.995 0.998 0.999 1.000 
6.4 0.939 0.969 0.986 0.994 0.997 0.999 1.000 
6.6 0.927 0.963 0.982 0.992 0.997 0.999 0.999 1.000 
6.8 0.915 0.955 0.978 0.990 0.996 0.998 0.999 1.000 
70 0.901 0.947 0.973 0.987 0.994 0.998 0.999 1.000 


72 0.887 0.937 0.967 0.984 0.993 0.997 0.999 0.999 1.000 
74 0.871 0.926 0.961 0.980 0.991 0.996 0.998 0.999 1.000 
76 0.854 0.915 0.954 0.976 0.989 0.995 0.998 0.999 1.000 
7.8 0.835 0.902 0.945 0.971 0.986 0.993 0.997 0.999 1.000 


8.0 0.816 0.888 0.936 0.966 0.983 0.992 0.996 0.998 0.999 1.000 
8.5 0.763 0.849 0.909 0.949 0.973 0.986 0.993 0.997 0.999 0.999 
9.0 0.706 0.803 0.876 0.926 0.959 0.978 0.989 0.995 0.998 0.999 
95 0.645 0.752 0.836 0.898 0.940 0.967 0.982 0.991 0.996 0.998 
100 0.583 0.697 0.792 0.864 0.917 0.951 0.973 0.986 0.993 0.997 


20 21 22 


9.5 0.999 1.000 
10.0 0.998 0.999 1.000 
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0 


3 


4 


5 


6 


7 


8 


9 


0.000 
0.000 
0.000 
0.000 
0.000 


0.000 
0.000 
0.000 
0.000 
0.000 


0.002 
0.001 
0.001 
0.001 
0.000 


0.000 
0.000 
0.000 
0.000 
0.000 


0.007 
0.005 
0.003 
0.002 
0.002 


0.001 
0.001 
0.000 
0.000 
0.000 


0.021 
0.015 
0.011 
0.008 
0.005 


0.004 
0.003 
0.002 
0.001 
0.001 


0.050 
0.038 
0.028 
0.020 
0.015 


0.011 
0.008 
0.006 
0.004 
0.003 


0.102 
0.079 
0.060 
0.046 
0.035 


0.026 
0.019 
0.014 
0.010 
0.008 


0.179 
0.143 
0.114 
0.090 
0.070 


0.054 
0.041 
0.032 
0.024 
0.018 


0.279 
0.232 
0.191 
0.155 
0.125 


0.100 
0.079 
0.062 
0.048 
0.037 


0.397 
0.341 
0.289 
0.242 
0.201 


0.166 
0.135 
0.109 
0.088 
0.070 


10 


12 


13 


14 


15 


16 


17 


18 


19 


0.521 
0.460 
0.402 
0.347 
0.297 


0.252 
0.211 
0.176 
0.145 
0.118 


0.742 
0.689 
0.633 
0.576 
0.519 


0.463 
0.409 
0.358 
0.311 
0.268 


0.825 
0.781 
0.733 
0.682 
0.628 


0.573 
0.518 
0.464 
0.413 
0.363 


0.888 
0.854 
0.815 
0.772 
0.725 


0.675 
0.623 
0.570 
0.518 
0.466 


0.932 
0.907 
0.878 
0.844 
0.806 


0.764 
0.718 
0.669 
0.619 
0.568 


0.960 
0.944 
0.924 
0.899 
0.869 


0.835 
0.798 
0.756 
0.711 
0.664 


0.978 
0.968 
0.954 
0.937 
0.916 


0.890 
0.861 
0.827 
0.790 
0.749 


0.988 
0.982 
0.974 
0.963 
0.948 


0.930 
0.908 
0.883 
0.853 
0.819 


0.994 
0.991 
0.986 
0.979 
0.969 


0.957 
0.942 
0.923 
0.901 
0.875 


20 


22 


23 


24 


25 


26 


27 


28 


29 


0.997 
0.995 
0.992 
0.988 
0.983 


0.975 
0.965 
0.952 
0.936 
0.917 


0.999 
0.999 
0.998 
0.997 
0.995 


0.992 
0.989 
0.983 
0.976 
0.967 


1.000 
1.000 
0.999 
0.999 
0.998 


0.996 
0.994 
0.991 
0.986 
0.981 


1.000 
0.999 
0.999 


0.998 
0.997 
0.995 
0.992 
0.989 


1.000 
0.999 


0.999 
0.998 
0.997 
0.996 
0.994 


1.000 


1.000 
0.999 
0.999 
0.998 
0.997 


1.000 
0.999 
0.999 
0.998 


1.000 
0.999 
0.999 


1.000 
1.000 
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Table 3 (Continued ) 
a 
A 4 5 6 7 8 9 10 11 12 13 
16 0.000 0.001 0.004 0.010 0.022 0.043 0.077 0.127 0.193 0.275 
17 0.000 0.001 0.002 0.005 0.013 0.026 0.049 0.085 0.135 0.201 
18 0.000 0.000 0.001 0.003 0.007 0.015 0.030 0.055 0.092 0.143 
19 0.000 0.000 0.001 0.002 0.004 0.009 0.018 0.035 0.061 0.098 
20 0.000 0.000 0.000 0.001 0.002 0.005 0.011 0.021 0.039 0.066 
21 0.000 0.000 0.000 0.000 0.001 0.003 0.006 0.013 0.025 0.043 
22 0.000 0.000 0.000 0.000 0.001 0.002 0.004 0.008 0.015 0.028 
23 0.000 0.000 0.000 0.000 0.000 0.001 0.002 0.004 0.009 0.017 
24 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.003 0.005 0.011 
25 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.001 0.003 0.006 
14 15 16 17 18 19 20 21 22 23 
16 0.368 0.467 0.566 0.659 0.742 0.812 0.868 0.911 0.942 0.963 
17 0.281 0.371 0.468 0.564 0.655 0.736 0.805 0.861 0.905 0.937 
18 0.208 0.287 0.375 0.469 0.562 0.651 0.731 0.799 0.855 0.899 
19 0.150 0.215 0.292 0.378 0.469 0.561 0.647 0.725 0.793 0.849 
20 0.105 0.157 0.221 0.297 0.381 0.470 0.559 0.644 0.721 0.787 
21 0.072 0.111 0.163 0.227 0.302 0.384 0471 0.558 0.640 0.716 
22 0.048 0.077 0117 0.169 0.232 0.306 0.387 0.472 0.556 0.637 
23 0.031 0.052 0.082 0123 0175 0.238 0.310 0.389 0.472 0.555 
24 0.020 0.034 0.056 0.087 0128 0180 0.243 0314 0.392 0.473 
25 0.012 0.022 0.038 0.060 0.092 0134 0185 0247 0318 0.394 
24 25 26 27 28 29 30 31 32 33 
16 0.978 0.987 0.993 0.996 0.998 0.999 0.999 1.000 
17 0.959 0.975 0.985 0.991 0.995 0.997 0.999 0.999 1.000 
18 0.932 0.955 0.972 0.983 0.990 0.994 0.997 0.998 0.999 1.000 
19 0.893 0.927 0.951 0.969 0.980 0.988 0.993 0.996 0.998 0.999 
20 0.843 0.888 0.922 0.948 0.966 0.978 0.987 0.992 0.995 0.997 
21 0.782 0.838 0.883 0.917 0.944 0.963 0.976 0.985 0.991 0.994 
22, 0.712 0.777 0.832 0.877 0.913 0.940 0.959 0.973 0.983 0.989 
23 0.635 0.708 0.772 0.827 0.873 0.908 0.936 0.956 0.971 0.981 
24 0.554 0.632 0.704 0.768 0.823 0.868 0.904 0.932 0.953 0.969 
25 0.473 0.553 0.629 0.700 0.763 0.818 0.863 0.900 0.929 0.950 
34 35 36 37 38 39 40 41 42 43 
19 0.999 1.000 
20 0.999 0.999 1.000 
21 0997 0998 0.999 0.999 1.000 
22 0.994 0.996 0.998 0.999 0.999 1.000 
23 0.988 0993 0.996 0.997 0.999 0.999 1.000 
24 0.979 0987 0.992 0.995 0.997 0.998 0.999 0.999 1.000 
25 0.966 0.978 0.985 0.991 0.991 0.997 0.998 0.999 0.999 1.000 
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Table 4 Normal Curve Areas 
Standard normal probability in right-hand tail 


Area 
(for negative values of z, areas are found by symmetry) 
2 
Second decimal place of z 
2 00 01 ‚02 03 04 ‚05 ‚06 ‚07 ‚08 09 
00 .5000 4960 4920 4880 4840 .4801 .4761 4721 .4681  .4641 
0.1 4602 4562 4522 4483 4443 4404 4364 4325 4286  .4247 
02 4207 4166 4129 .4090 .4052 .4013 .3974 .3936 .3897  .3859 
03 .3821 .3783 .3745 3707 .3669 .3632 .3594 .3557 .3520 .3483 
04 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3156 .3121 
0.5 .3085 .3050 .3015 .2981 .2946 .2912 .2877 .2843 2810 .2776 
0.6 .2743 .2709 .2676 .2643 .2611 .2578 .2546 .2514 .2483 2451 
0.7 .2420 .2389 2358 .2327 .2296 .2266 .2236 .2206 2177  .2148 
0.8 .2119 .2090 .2061 .2033 .2005 .1977 1949 1922 .1894  .1867 
0.9 .1841 .1814 1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611 
1.0 1587 1562 .1539 1515 .1492 .1469 1446 .1423 1401 .1379 
11 1357 1335 .1314 .1292 1271 1251 .1230 .1210 1190 .1170 
1.2 .1151 .1131 1112 1093 .1075 .1056 .1038 1020 .1003 .0985 
1.3 .0968 .0951 .0934 .0918 .0901 .0885 .0869 .0853 .0838 .0823 
14 .0808 .0793 .0778 .0764 .0749  .0735  .0722 .0708  .0694 .0681 
1.5 .0668 .0655 .0643 .0630 .0618 .0606 .0594 .0582 .0571 .0559 
16 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455 
1.7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367 
1.8 .0359 .0352 .0344 .0336 .0329 .0322 .0314 .0307 .0301 .0294 
1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233 
2.0 .0228 .0222 .0217 .0212 .0207 .0202 .0197 .0192 .0188 .0183 
2.1 .0179 .0174 .0170 .0166 .0162 .0158 .0154 .0150 .0146 .0143 
22 .0139 .0136 .0132 .0129 .0125 .0122 .0119 .0116 .0113 .0110 
23 .0107 .0104 .0102 .0099 .0096 .0094 .0091 .0089 .0087 .0084 
2.4 .0082 .0080 .0078  .0075 .0073 .0071 .0069 .0068 .0066 .0064 
2.5 .0062 .0060 .0059 .0057 .0055 .0054 .0052 .0051 .0049 .0048 
26 .0047 .0045 .0044 .0043 .0041 .0040 .0039 .0038 .0037 .0036 
2.7 .0035 .0034 .0033 .0032 .0031 .0030 .0029 .0028 .0027 .0026 
2.8 .0026 .0025 .0024 .0023 .0023 .0022 .0021 .0021 .0020 .0019 
2.9 .0019 .0018 .0017 .0017 .0016 4.0016 .0015 4.0015 .0014 .0014 
3.0 .00135 
3.5 .000 233 
4.0 .000 0317 
4.5 .000 003 40 
5.0 .000 000 287 


From К. Е. Walpole, Introduction to Statistics (New York: Macmillan, 1968). 


Table 5 Percentage Points of the t Distributions 


a 


1 


а 


1100 1050 1025 1010 1005 df 
3.078 6.314 12.706 31.821 63.657 1 
1.886 2.920 4.303 6.965 9.925 2 
1.638 2.353 3.182 4.541 5.841 3 
1.533 2.132 2.776 3.747 4.604 4 
1.476 2.015 2.571 3.365 4.032 5 
1.440 1.943 2.447 3.143 3.707 6 
1.415 1.895 2.365 2.998 3.499 7 
1.397 1.860 2.306 2.896 3.355 8 
1.383 1.833 2.262 2.821 3.250 9 
1.372 1.812 2.228 2.764 3.169 10 
1.363 1.796 2.201 2.718 3.106 11 
1.356 1.782 2.179 2.681 3.055 12 
1.350 1.771 2.160 2.650 3.012 13 
1.345 1.761 2.145 2.624 2.977 14 
1.341 1.753 2.131 2.602 2.947 15 
1.337 1.746 2.120 2.583 2.921 16 
1.333 1.740 2.110 2.567 2.898 17 
1.330 1.734 2.101 2.552 2.878 18 
1.328 1.729 2.093 2.539 2.861 19 
1.325 1.725 2.086 2.528 2.845 20 
1.323 1.721 2.080 2.518 2.831 21 
1.321 1.717 2.074 2.508 2.819 22 
1.319 1.714 2.069 2.500 2.807 23 
1.318 1.711 2.064 2.492 2.797 24 
1.316 1.708 2.060 2.485 2.787 25 
1.315 1.706 2.056 2.479 2.779 26 
1.314 1.703 2.052 2.473 21 27 
1.313 1.701 2.048 2.467 2.763 28 
1.311 1.699 2.045 2.462 2.756 29 
1.282 1.645 1.960 2.326 2.576 inf. 


From “Table of Percentage Points of the t-Distribution.’” Computed by 
Maxine Merrington, Biometrika, Vol. 32 (1941), p. 300. 
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Table 6 Percentage Points of the x? Distributions 


0 Ka 
df X0.995 X6.990 X615 X050 X6.900 
1 0.0000393 0.0001571 0.0009821 0.0039321 0.0157908 
2 0.0100251 0.0201007 0.0506356 0.102587 0.210720 
3 0.0717212 0.114832 0.215795 0.351846 0.584375 
4 0.206990 0.297110 0.484419 0.710721 1.063623 
5 0.411740 0.554300 0.831211 1.145476 1.61031 
6 0.675727 0.872085 1.237347 1.63539 2.20413 
7 0.989265 1.239043 1.68987 2.16735 2.83311 
8 1.344419 1.646482 2.17973 2.73264 3.48954 
9 1.734926 2.087912 2.70039 332511 4.16816 
10 2.15585 2.55821 3.24697 3.94030 4.86518 
11 2.60321 3.05347 3.81575 4.57481 5.57779 
12 3.07382 3.57056 4.40379 5.22603 6.30380 
13 3.56503 4.10691 5.00874 5.89186 7.04150 
14 4.07468 4.66043 5.62872 6.57063 7.78953 
15 4.60094 5.22935 6.26214 7.26094 8.54675 
16 5.14224 5.81221 6.90766 7.96164 9.31223 
17 5.69724 6.40776 7.56418 8.67176 10.0852 
18 6.26481 7.01491 8.23075 9.39046 10.8649 
19 6.84398 7.63273 8.90655 10.1170 11.6509 
20 7.43386 8.26040 9.59083 10.8508 12.4426 
21 8.03366 8.89720 10.28293 11.5913 13.2396 
22 8.64272 9.54249 10.9823 12.3380 14.0415 
23 9.26042 10.19567 11.6885 13.0905 14.8479 
24 9.88623 10.8564 12.4011 13.8484 15.6587 
25 10.5197 11.5240 13.1197 14.6114 16.4734 
26 11.1603 12.1981 13.8439 15.3791 17.2919 
27 11.8076 12.8786 14.5733 16.1513 18.1138 
28 12.4613 13.5648 15.3079 16.9279 18.9392 
29 13.1211 14.2565 16.0471 17.7083 19.7677 
30 13.7867 14.9535 16.7908 18.4926 20.5992 
40 20.7065 22.1643 24.4331 26.5093 29.0505 
50 27.9907 29.7067 32.3574 34.7642 37.6886 
60 35.5346 37.4848 40.4817 43.1879 46.4589 
70 43.2752 45.4418 48.7576 51.7393 55.3290 
80 51.1720 53.5400 57.1532, 60.3915 64.2778 
90 59.1963 61.7541 65.6466 69.1260 73.2912 


100 67.3276 70.0648 74.2219 77.9295 82.3581 
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Table 6 (Continued ) 


2 2 2 2 2 df 
X0.100 X0.050 X0.025 X0.010 X0.005 


2.70554 3.84146 5.02389 6.63490 7.87944 
4.60517 5.99147 7.37776 9.21034 10.5966 
6.25139 7.81473 9.34840 11.3449 12.8381 
7.77944 9.48773 11.1433 13.2767 14.8602 


1 
2 
3 
4 
9.23635 11.0705 12.8325 15.0863 16.7496 5 
10.6446 12.5916 14.4494 16.8119 18.5476 6 
12.0170 14.0671 16.0128 18.4753 20.2777 7 
13.3616 15.5073 17.5346 20.0902 21.9550 8 
14.6837 16.9190 19.0228 21.6660 23.5893 9 


15.9871 18.3070 20.4831 23.2093 25.1882 10 
17.2750 19.6751 21.9200 24.7250 26.7569 11 
18.5494 21.0261 23.3367 26.2170 28.2995 12 
19.8119 22.3621 24.7356 27.6883 29.8194 13 
21.0642 23.6848 26.1190 29.1413 31.3193 14 


22.3072 24.9958 27.4884 30.5779 32.8013 15 
23.5418 26.2962 28.8454 31.9999 34.2672 16 
24.7690 27.5871 30.1910 33.4087 35.7185 17 
25.9894 28.8693 31.5264 34.8053 37.1564 18 
27.2036 30.1435 32.8523 36.1908 38.5822 19 


28.4120 314104 34.1696 37.5662 39.9968 20 
29.6151 32.6705 35.4789 38.9321 41.4010 21 
30.8133 33.9244 36.7807 40.2894 42.7956 22 
32.0069 35.1725 38.0757 41.6384 44.1813 23 
33.1963 36.4151 39.3641 42.9798 45.5585 24 


34.3816 37.6525 40.6465 44.3141 46.9278 25 
35.5631 38.8852 41.9232 45.6417 48.2899 26 
36.7412 40.1133 43.1944 46.9630 49.6449 27 
37.9159 41.3372 44.4607 48.2782 50.9933 28 
39.0875 42.5569 45.7222 49.5879 52.3356 29 


40.2560 43.7729 46.9792 50.8922 53.6720 30 
51.8050 55.7585 59.3417 63.6907 66.7659 40 
63.1671 67.5048 71.4202 76.1539 79.4900 50 
74.3970 79.0819 83.2976 88.3794 91.9517 60 


85.5271 90.5312 95.0231 100.425 104.215 70 
96.5782 101.879 106.629 112.329 116.321 80 
107.565 113.145 118.136 124.116 128.299 90 


118.498 124.342 129.561 135.807 140.169 100 


From “Tables of the Percentage Points of the x?-Distribution." Biometrika, Vol. 32 
(1941), pp. 188-189, by Catherine M. Thompson. 
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Table 7 Percentage Points of the F Distributions 


0 F, 
Numerator df 
Denominator 

df a 1 2 3 4 5 6 T 8 9 

1 .100 39.86 49.50 53.59 55.83 57.24 58.20 58.91 59.44 59.86 
.050 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 
.025 6478 799.5 864.2 899.6 921.8 937.1 948.2 956.7 963.3 
010 4052 4999.5 5403 5625 5764 5859 5928 5982 6022 
.005 16211 20000 21615 22500 23056 23437 23715 23925 24091 

2 .100 8.53 9.00 9.16 9.24 9.29 9.33 9.35 9.37 9.38 
.050 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 
.025 38.51 39.00 39.17 39.25 39.30 39.33 39.36 39.37 39.39 
010 98.50 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 
005 198.5 199.0 199.2 199.2 199.3 199.3 199.4 199.4 199.4 

3 .100 5.54 5.46 5.39 5.34 5.31 5.28 5.27 5.25 5.24 
.050 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 
.025 17.44 16.04 15.44 15.10 14.88 14.73 14.62 14.54 14.47 
.010 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.35 
005 55.55 49.80 47.47 46.19 45.39 44.84 44.43 44.13 43.88 

4 .100 4.54 4.32 4.19 4.11 4.05 4.01 3.98 3.95 3.94 
.050 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 
‚025 12.22 10.65 9.98 9.60 9.36 9.20 9.07 8.98 8.90 
.010 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 
‚005 31.33 26.28 24.26 23.15 22.46 21.97 21.62 21.35 21.14 

5 .100 4.06 3.78 3.62 3.52 3.45 3.40 3:37. 3.34 3.32 
.050 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 
‚025 10.01 8.43 7.76 7.39 7.15 6.98 6.85 6.76 6.68 
010 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 
‚005 22.78 18.31 16.53 15.56 14.94 14.51 14.20 13.96 13.77 

6 .100 3.78 3.46 3.29 3.18 2-11 3.05 3.01 2.98 2.96 
.050 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 
.025 8.81 7.26 6.60 6.23 5.99 5.82 5.70 5.60 5.52 
.010 13:75 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 
.005 18.63 14.54 12.92 12.03 11.46 11.07 10.79 10.57 10.39 

7 .100 3.59 3.26 3.07 2.96 2.88 2.83 2.78 2.75 2:72 
.050 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 
.025 8.07 6.54 5.89 5.52 5.29 5.12 4.99 4.90 4.82 
.010 12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 
.005 16.24 12.40 10.88 10.05 9.52 9.16 8.89 8.68 8.51 
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Table 7 (Continued ) 
Fy 


Numerator df 
10 12 15 20 24 30 40 60 120 оо a |df 


60.19 60.71 61.22 61.74 62.00 62.26 62.53 62.79 63.06 63.33 .100} 1 
241.9 243.9 245.9 248.0 249.1 250.1 251.1 252.2 253.3 254.3 .050 
968.6 976.7 984.9 993.1 997.2 1001 1006 1010 1014 1018 025 

6056 6106 6157 6209 6235 6261 6287 6313 6339 6366 010 
24224 24426 24630 24836 24940 25044 25148 25253 25359 25465 005 


9.39 9.41 9.42 9.44 9.45 9.46 9.47 9.47 9.48 9.49 .100} 2 
19.40 19.41 19.43 19.45 19.45 19.45 19.47 19.48 19.49 19.50 .050 
39.40 39.41 39.43 39.45 39.46 39.46 39.47 39.48 39.49 39.50 .025 
99.40 99.42 99.43 99.45 99.46 99.47 99.47 99.48 99.49 99.50 .010 
199.4 199.4 199.4 199.4 199.5 199.5 199.5 199.5 199.5 199.5 .005 


5.23 5.22 5.20 5.18 5.18 5.17 5.16 9:15. 5.14 5.13 .100) 3 
8.79 8.74 8.70 8.66 8.64 8.62 8.59 8.57 8.55 8.53 .050 
14.42 14.34 14.25 14.17 14.12 14.08 14.04 13.99 13.95 13.90 .025 
27.23 27.05 26.87 26.69 26.60 26.50 26.41 26.32 26.22 26.13 .010 
43.69 43.39 43.08 42.78 42.62 42.47 42.31 42.15 41.99 41.83 .005 


3.92 3.90 3.87 3.84 3.83 3.82 3.80 3.79 3.78 3.76 .100 |4 
5.96 5.91 5.86 5.80 5.77 5.75 5.72 5.69 5.66 5.63 .050 
8.84 8.75 8.66 8.56 8.51 8.46 8.41 8.36 8.31 8.26 .025 
14.55 14.37 14.20 14.02 13.93 13.84 13:75 13.65 13.56 13.46 .010 
20.97 20.70 20.44 20.17 20.03 19.89 19.75 19.61 19.47 19.32 .005 


3.30 3.27 3.24 3.21 3.19 347 3.16 3.14 3.12 3.10 .100| 5 
4.74 4.68 4.62 4.56 4.53 4.50 4.46 4.43 4.40 4.36 .050 
6.62 6.52 6.43 6.33 6.28 6.23 6.18 6.12 6.07 6.02 .025 
10.05 9.89 9.72 9.55 9.47 9.38 9.29 9.20 9.11 9.02 .010 
13.62 13.38 13.15 12.90 12.78 12.66 12.53 12.40 12.27 12.14 .005 


2.94 2.90 2.87 2.84 2.82 2.80 2.78 2.76 2.74 2.72 .100| 6 
4.06 4.00 3.94 3.87 3.84 3.81 3:77 3.74 3.70 3.67 .050 
5.46 9:37 5.27 5.17 5.12 5.07 5.01 4.96 4.90 4.85 .025 
7.87 7.72, 7.56 7.40 Tal 7.23 7.14 7.06 6.97 6.88 .010 
10.25 10.03 9.81 9.59 9.47 9.36 9.24 9.12 9.00 8.88 .005 


2.70 2.67 2.63 2.59 2.58 2.56 2.54 2.51 2.49 2.47 .100| 7 
3.64 3.57 3.51 3.44 3.41 3.38 3.34 3.30 3.27 3.23 .050 
4.76 4.67 4.57 4.47 4.42 4.36 4.31 4.25 4.20 4.14 .025 
6.62 6.47 6.31 6.16 6.07 5.99 5.91 5.82 5.74 5.65 .010 
8.38 8.18 7.97 7-79 7.65 7.53 7.42 7.31 7.19 7.08 .005 
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Table 7 (Continued ) 


Tables 


Е, 
| Numerator df 
Denominator 

df a 1 2 3 4 5 6 7 8 9 

8 .100 3.46 3.11 2.92 2.81 2.73 2.67 2.62 2.59 2.56 
.050 5:32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 
.025 7.57 6.06 5.42 5.05 4.82 4.65 4.53 443 4.36 
010 11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 
.005 14.69 11.04 9.60 8.81 8.30 7.95 7.69 7.50 7.34 

9 .100 3.36 3.01 2.81 2.69 2.61 2:55 2:51 2.47 2.44 
.050 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 
.025 7.21 5.71 5.08 4.72 4.48 4.32 4.20 4.10 4.03 
010 10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 
.005 13.61 10.11 8.72 7.96 7.47 TAS 6.88 6.69 6.54 

10 .100 3.29 2.92 2.73 2.61 2.52 2.46 2.41 2.38 2.35 
.050 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 
.025 6.94 5.46 4.83 4.47 4.24 4.07 3.95 3.85 3.78 
.010 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 
.005 12.83 9.43 8.08 7.34 6.87 6.54 6.30 6.12 5.97 

11 ‚100 3.23 2.86 2.66 2.54 2.45 2.39 2.34 2.30 2.27 
050 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 
025 6.72 5.26 4.63 4.28 4.04 3.88 3.76 3.66 3.59 
010 9.65 7.21 6.22 5.67 5.32 5.07 4.89 4.74 4.63 
.005 12.23 8.91 7.60 6.88 6.42 6.10 5.86 5.68 5.54 

12 .100 3.18 2.81 2.61 2.48 2.39 2.33 2.28 2.24 2.21 
050 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 
.025 6.55 5.10 4.47 4.12 3.89 3.73 3.61 3.51 3.44 
.010 9.33 6.93 5.95 5.41 5.06 4.82 4.64 4.50 4.39 
.005 11:75 8.51 7.23 6.52 6.07 5.76 5.52 5.35 5.20 

13 ‚100 3.14 2.76 2.56 2.43 2.35 2.28 2.23 2.20 2.16 
.050 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.71 2.71 
‚025 6.41 4.97 4.35 4.00 3.77 3.60 3.48 3.39 3.31 
010 9.07 6.70 5.74 5.21 4.86 4.62 4.44 4.30 4.19 
005 11.37 8.19 6.93 6.23 5.79 5.48 95.25 5.08 4.94 

14 100 3.10 2.73 2.52 2.39 2.31 2.24 2.19 2.15 2.12 
.050 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 
.025 6.30 4.86 4.24 3.89 3.66 3.50 3.38 3.29 3.21 
.010 8.86 6.51 5.56 5.04 4.69 4.46 4.28 4.14 4.03 
.005 11.06 7.92 6.68 6.00 5.56 5.26 5.03 4.86 4.72 
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Table 7 (Continued ) 
Fy 
Numerator df 

10 12 15 20 24 30 40 60 120 оо а ағ 
2.54 2.50 2.46 242 240 2.38 2.36 234 2.32 229 100 8 
3.35 328 3.22 3.15 3.12 3.08 3.04 3.01 2.97 2.93 .050 
430 4.20 4.10 4.00 3.95 3.89 3.84 3.78 3.73 3.67 025 
5.81 5.67 5.52 5.36 528 5.20 5.12 5.03 4.95 4.86 .010 
7.21 7.01 6.81 661 650 640 6.29 6.18 6.06 5.95 .005 
2.42 2.38 2.34 2.30 228 225 223 2.21 2.18 216 .100 9 
3.14 3.07 3.01 2.94 2.90 2.86 2.83 2.79 2.75 2.71  .050 
3.96 3.87 3.77 3.67 3.61 3.56 3.51 3.45 3.39 3.33 .025 
526 5.11 496 4.81 4.73 4.65 4.57 448 440 431  .010 
6.42 623 6.03 5.83 5.73 5.62 5.52 541 5.30 5.19 .005 
2.32 2.28 224 220 2.18 2.16 2.13 211 2.08 2.06 .100 | 10 
2.98 2.91 2.85 2.77 2.74 2.70 2.66 2.62 2.58 2.54 .050 
3.72 3.62 3.52 342 3.37 331 326 3.20 3.14 3.08 .025 
4.85 4.71 456 441 4.33 425 417 4.08 4.00 3.91 010 
5.85 5.66 547 5.27 5.17 5.07 4.97 4.86 4.75 4.64 .005 
225 221 217 2.12 2.10 2.08 2.05 2.03 2.00 1.97 .100 | 11 
2.85 2.79 2.72 2.65 2.61 2.57 2.53 2.49 245 240 .050 
3.53 3.43 333 323 3.17 3.12 3.06 3.00 2.94 2.88  .025 
4.54 4.40 425 410 4.02 3.94 3.86 3.78 3.69 3.60  .010 
5.42 5.24 5.05 4.86 4.76 4.65 4.55 444 4.34 423  .005 
2.19 2.15 2.10 2.06 2.04 2.01 1.99 1.96 1.93 190 .100 | 12 
2.75 2.60 2.62 2.54 2.51 247 243 2.38 2.34 230 .050 
3.37 328 3.18 3.07 3.02 2.96 2.91 2.85 2.79 2.72 025 
430 4.16 4.01 3.86 3.78 3.70 3.62 3.54 3.45 3.36 010 
5.09 4.91 4.72 4.53 443 433 423 412 401 3.90 .005 
2.14 210 2.05 2.01 1.98 1.96 1.93 1.90 1.88 1.85 .100 | 13 
2.67 260 2.53 246 242 2.38 234 230 225 221 050 
3.25 3.15 3.05 2.95 2.89 2.84 2.78 2.72 2.66 2.60 .025 
410 3.96 3.82 3.66 3.59 3.51 3.43 3.34 3.25 3.17 010 
482 464 446 427 417 4.07 3.97 3.87 3.76 3.65 005 
210 2.05 2.01 1.96 1.94 1.91 1.89 1.86 1.83 1.80 .100 | 14 
2.60 2.53 246 239 2.35 2.31 227 222 2.18 213 050 
3.15 3.05 2.95 2.84 2.79 2.73 2.67 261 2.55 249 025 
3.94 3.80 3.66 3.51 3.43 3.35 327 3.18 3.09 3.00 .010 
460 4.43 425 4.06 3.96 3.86 3.76 3.66 3.55 3.44 005 
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Table 7 (Continued ) 


P. 
А Numerator df 
Denominator 

df a 1 2 3 4 5 6 T 8 9 

15 .100 3.07 2.70 2.49 2.36 227 221 216 2.12 2.09 
.050 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 
.025 6.20 4.77 4.15 3.80 3.58 341 329 320 312 
.010 8.68 6.36 5.42 4.89 4.56 432 4.14 4.00 3.89 
.005 10.80 7.70 6.48 5.80 537 5.07 4.85 4.67 4.54 

16 .100 3.05 2.67 2.46 2.33 224 2.18 2.13 2.09 2.06 
.050 4.49 3.60 324 301 2.85 2.74 2.66 2.59 2.54 
.025 612 4.69 4.08 3.73 3.50 334 322 3.12 3.05 
.010 8.53 6.23 529 4.77 444 4.220 4.03 3.89 3.78 
.005 1058 751 630 5.64 521 491 4.69 4.52 438 

17 100 3.03 2.64 2.44 2.31 2.22 2.15 210 2.06 2.03 
.050 445 3.59 320 2.96 2.81 2.70 261 2.55 249 
.025 6.04 4.62 4.01 3.66 3.44 3.28 3.16 3.06 2.98 
.010 8.40 6.11 5.18 467 434 4.10 3.93 3.79 3.68 
.005 10.38 7.35 6.16 5.50 5.07 4.78 4.56 4.39 4.25 

18 .100 3.0 2.62 242 229 220 2.13 2.08 2.04 2.00 
.050 4.41 3.55 3.16 293 277 2.66 2.58 251 246 
025 598 456 395 361 336 322 310 3.01 2.93 
010 829 601 5.09 458 425 401 3.84 3.71 3.60 
005 1022 721 603 537 496 4.66 444 428 4.14 

19 100 2.99 2.61 240 2.27 2.18 2.11 2.06 2.02 1.98 
050 4.38 3.52 3.13 290 274 263 254 248 242 
025 5.92 451 390 3.56 333 317 305 296 2.88 
010 8.18 5.93 501 450 417 3.94 3.77 3.63 3.52 
.005 1007 709 592 527 485 4.56 434 4.18 4.04 

20 .100 2.97 2.59 238 225 216 209 2.04 2.00 1.96 
.050 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 
.025 5.87 4.46 3.86 3.51 329 3.13 3.01 2.91 284 
.010 8.10 5.85 4.94 443 4.10 3.87 3.70 3.56 3.46 
.005 9.94 6.99 5.82 5.17 4.76 447 426 4.09 3.96 

21 100 2.96 2.57 2.36 2.23 2.14 2.08 2.02 1.98 1.95 
.050 4.32 3.47 3.07 2.84 2.68 2.57 249 242 237 
.025 583 442 3.82 3.48 3.25 3.09 297 2.87 2.80 
.010 802 5.78 487 4.37 4.04 3.81 3.64 3.51 3.40 
.005 9.83 6.80 5.73 509 4.68 4.39 4.18 4.01 3.88 
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Table 7 (Continued ) 
Е, 
Numerator df 

10 12 15 20 24 30 40 60 120 оо а ағ 
2.06 2.02 1.97 1.92 1.90 1.87 1.85 1.82 1.79 1.76 100 15 
2.54 2.48 2.40 2.33 2.29 2:25 2.20 2.16 2.11 2.07 .050 
3.06 2.96 2.86 2.76 2.70 2.64 2.59 2.52 2.46 2.40 .025 
3.80 3.67 3.52 3.37 3.29 3.21 3.13 3.05 2.96 2.87 .010 
4.42 4.25 4.07 3.88 3.79 3.69 3.58 3.48 3:37. 3.26 .005 
2.03 1.99 1.94 1.89 1.87 1.84 1.81 1.78 1.75 1.72 .100 16 
2.49 2.42 2.35 2.28 2.24 2.19 2.15 2.11 2.06 2.01 .050 
2.99 2.89 2.79 2.68 2.63 2:57 2:51 2.45 2.38 2.32 ‚025 
3.69 3:55 3.41 3.26 3.18 3.10 3.02 2.93 2.84 2.75 ‚010 
4.27 4.10 3.92 3.73 3.64 3.54 3.44 3.33 3.22 3.11 ‚005 
2.00 1.96 1.91 1.86 1.84 1.81 1.78 1.75 1:72 1.69 .100 17 
2.45 2.38 2.31 2.23 2.19 2:15 2.10 2.06 2.01 1.96 ‚050 
2.92 2.82 2.72 2.62 2.56 2.50 2.44 2.38 2.32 2.25 ‚025 
3.59 3.46 3.31 3.16 3.08 3.00 2.92 2.83 2.75 2.65 .010 
4.14 3.97 3.79 3.61 3.51 3.41 3.31 3.21 3.10 2.98 .005 
1.98 1.93 1.89 1.84 1.81 1.78 1.75 1.72 1.69 1.66 .100 18 
2.41 2.34 2:27 2.19 2.15 2.11 2.06 2.02 1.97 1.92 .050 
2.87 2.71 2.67 2.56 2.50 2.44 2.38 2.32 2.26 2.19 .025 
3.51 3.37 3.23 3.08 3.00 2.92 2.84 2.75 2.66 2.57 .010 
4.03 3.86 3.68 3.50 3.40 3.30 3.20 3.10 2.99 2.87 .005 
1.96 1.91 1.86 1.81 1.79 1.76 1.73 1.70 1.67 1.63 .100 19 
2.38 2.31 2.23 2.16 2.11 2.07 2.03 1.98 1.93 1.88 .050 
2.82 2.72 2.62 2.51 2.45 2.39 2.33 2.27 2.20 2.13 ‚025 
3.43 3.30 3.15 3.00 2.92 2.84 2.76 2.67 2.58 2.49 ‚010 
3.93 3.76 3.59 3.40 3.31 3.21 3.11 3.00 2.89 2.78 ‚005 
1.94 1.89 1.84 1.79 1.77 1.74 1.71 1.68 1.64 1.61 .100 20 
2.35 2.28 2.20 2.12 2.08 2.04 1.99 1.95 1.90 1.84 .050 
ЛЕ! 2.68 2,57 2.46 2.41 2.35 2.29 2.22. 2.16 2.09 ‚025 
3.37 3:23 3.09 2.94 2.86 2.78 2.69 2.61 252 2.42 ‚010 
3.85 3.68 3.50 3.32 3.22 3.12 3.02 2.92 2.81 2.69 .005 
1.92 1.87 1.83 1.78 1:75 1.72 1.69 1.66 1.62 1.59 .100 21 
2.32 2.25 2.18 2.10 2.05 2.01 1.96 1.92 1.87 1.81 ‚050 
2.73 2.64 2:53 2.42 2.37 2.31 2.25 2.18 2.11 2.04 ‚025 
3.31 3.17 3.03 2.88 2.80 2.72 2.64 2.55 2.46 2.36 .010 
3.77 3.60 3.43 3.24 3.15 3.05 2.95 2.84 2.73 2.61 .005 
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Table 7 (Continued ) 


Denominator 
df 


Numerator df 


4 


5 


6 


7 


8 


9 


22 


23 


24 


25 


26 


27 


28 


2.56 
3.44 
4.38 
5.72 
6.81 


2.55 
3.42 
4.35 
5.66 
6.73 


2.54 
3.40 
4.32 
5.61 
6.66 


2.53 
3.39 
4.29 
5.57 
6.60 


2.52 
3.37 
4.27 
5.53 
6.54 


2.51 
3.35 
4.24 
5.49 
6.49 


2.50 
3.34 
4.22 
5.45 
6.44 


2.22 
2.82 
3.44 
4.31 
5.02 


2.21 
2.80 
3.41 
4.26 
4.95 


2.19 
2.78 
3.38 
4.22 
4.89 


2.18 
2.76 
3:35 
4.18 
4.84 


2.17 
2.74 
3.33 
4.14 
4.79 


2.17 
2.73 
3.31 
4.11 
4.74 


2.16 
2.71 
3.29 
4.07 
4.70 


2.13 
2.66 
3.22 
3.99 
4.61 


2.11 
2.64 
3.18 
3.94 
4.54 


2.10 
2.62 
3.15 
3.90 
4.49 


2.09 
2.60 
3.13 
3.85 
4.43 


2.08 
2.59 
3.10 
3.82 
4.38 


2.07 
2.57 
3.08 
3.78 
4.34 


2.06 
2.56 
3.06 
3:79 
4.30 


2.06 
2.95 
3.05 
3.76 
4.32 


2.05 
2.53 
3.02 
3.71 
4.26 


2.04 
2.51 
2.99 
3.67 
4.20 


2.02 
2.49 
2.97 
3.63 
4.15 


2.01 
2.47 
2.94 
3.59 
4.10 


2.00 
2.46 
2.92 
3.56 
4.06 


2.00 
2.45 
2.90 
3.53 
4.02 


2.01 
2.46 
2.93 
3.59 
4.11 


1.99 
2.44 
2.90 
3.54 
4.05 


1.98 
2.42 
2.87 
3.50 
3.99 


1.97 
2.40 
2.85 
3.46 
3.94 


1.96 
2.39 
2.82 
3.42 
3.89 


1.95 
2.37 
2.80 
3.39 
3.85 


1.94 
2.36 
2.78 
3.36 
3.81 


1.97 
2.40 
2.84 
3.45 
3.94 


1.95 
2.37 
2.81 
3.41 
3.88 


1.94 
2.36 
2.78 
3.36 
3.83 


1.93 
2.34 
2.79. 
3.32 
3.78 


1.92 
2.32 
2.13 
3.29 
3:13 


1.91 
2.31 
2.71 
3.26 
3.69 


1.90 
2.29 
2.69 
3:23 
3.65 


1:93 
2.34 
2.76 
3:39 
3.81 


1.92 
2.32 
2.73 
3.30 
3:15 


1.91 
2.30 
2.70 
3.26 
3.69 


1.89 
2.28 
2.68 
3.22 
3.64 


1.88 
2.27 
2.65 
3.18 
3.60 


1.87 
2.25 
2.63 
3.15 
3.56 


1.87 
2.24 
2.61 
3.12 
3.52 


Tables 
Table 7 (Continued ) 
Fy 
Numerator df 

10 12 15 20 24 30 40 60 120 оо а ағ 
1.90 1.86 1.81 1.76 1.73 1.70 1.67 1.64 1.60 1:57 .100 22 
2.30 2.23 2.15 2.07 2.03 1.98 1.94 1.89 1.84 1.78 .050 
2.70 2.60 2.50 2.39 2.33 2.27 2.21 2.14 2.08 2.00 .025 
3.26 3.12 2.98 2.83 2.75 2.67 2.58 2.50 2.40 2.31 .010 
3.70 3.54 3.36 3.18 3.08 2.98 2.88 2 TT 2.66 2.55 .005 
1.89 1.84 1.80 1.74 1:72 1.69 1.66 1.62 1.59 1.55 .100 23 
227 2.20 2.13 2.05 2.01 1.96 1.91 1.86 1.81 1.76 .050 
2.67 257 2.47 2.36 2.30 2.24 2.18 2.11 2.04 1.97 .025 
3.21 3.07 2.93 2.78 2.70 2.62 2.54 2.45 2.35 2.26 010 
3.64 3.47 3.30 3.12 3.02 2.92 2.82 2.71 2.60 2.48 ‚005 
1.88 1.83 1.78 1.73 1.70 1.67 1.64 1.61 1.57 1.53 .100 24 
2.25 2.18 241 2.03 1.98 1.94 1.89 1.84 1.79 1:73 .050 
2.64 2.54 2.44 2.33 2.27 2.21 2.15 2.08 2.01 1.94 025 
3.17 3.03 2.89 2.74 2.66 2.58 2.49 2.40 2.31 2.21 ‚010 
3.59 3.42 3,25 3.06 2.97 2.87 2.77 2.66 2:55 2.43 .005 
1.87 1.82 1.77 1.72 1.69 1.66 1.63 1.59 1.56 1:52 .100 25 
2.24 2.16 2.09 2.01 1.96 1.92 1.87 1.82 1.77 1.71 .050 
2.61 2.51 2.41 2.30 2.24 2.18 2.12 2.05 1.98 1.91 ‚025 
3.13 2.99 2.85 2.70 2.62 2.54 2.45 2.36 2.27 217 010 
3.54 3.37 3.20 3.01 2.92 2.82 2.72 2.61 2.50 2.38 .005 
1.86 1.81 1.76 1.71 1.68 1.65 1.61 1.58 1.54 1.50 .100 26 
2.22 2.15 2.07 1.99 1.95 1.90 1.85 1.80 1:75 1.69 .050 
2.59 2.49 2.39 2.28 2.22, 2.16 2.09 2.03 1.95 1.88 .025 
3.09 2.96 2.81 2.66 2.58 2.50 2.42 2.33 2.23 2.13 .010 
3.49 3.33 3.15 2.97 2.87 2.77 2.67 2.56 2.45 2.33 .005 

1.85 1.80 LAD 1.70 1.67 1.64 1.60 1.57 1.53 1.49 .100 27 
2.20 2.13 2.06 1.97 1.93 1.88 1.84 1.79 1.73 1.67 .050 
2.57 2.47 2.36 2.25 2.19 2.13 2.07 2.00 1.93 1.85 .025 
3.06 2.93 2.78 2.63 2.55 2.47 2.38 2.29 2.20 2.10 010 
3.45 3.28 3d] 2.93 2.83 2.73 2.63 2.52. 2.41 2.29 ‚005 

1.84 1.79 1.74 1.69 1.66 1.63 1.59 1.56 1.52 1.48 ‚100 28 
2.19 2.12 2.04 1.96 1.91 1.87 1.82 1.77 1.71 1.65 ‚050 
2.55 2.45 2.34 2.23 217 2.11 2.05 1.98 1.91 1.83 .025 
3.03 2.90 2:75 2.60 2.52 2.44 2.35 2.26 2.17 2.06 010 
3.41 3.25 3.07 2.89 2.79 2.69 2.59 2.48 2.37 2.25 .005 
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Table 7 (Continued ) 


Fy 
Denominator Numerator df 
ud i 1 2 3 4 5 6 7 8 9 
29 .100 2.89 2.50 2.28 2.15 2.06 1.99 1.93 1.89 1.86 


.050 4.18 3:35 2:93 2.70 2.55 2.43 2.35 2.28 2.22 
.025 5.59 4.20 3.61 3:2] 3.04 2.88 2.76 2.67 2.59 
010 7.60 5.42 4.54 4.04 3.73 3.50 3.33 3.20 3.09 
.005 9.23 6.40 5.28 4.66 4.26 3.98 3T 3.61 3.48 


30 .100 2.88 2.49 2.28 2.14 2.05 1.98 1.93 1.88 1.85 
.050 4.17 3:32 2.92 2.69 2.53 2.42 233: 2.27 2.21 
.025 5:57 4.18 3.59 3.25 3.03 2.87 2.75 2.65 2.57 
010 7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.07 
‚005 9.18 6.35 5.24 4.62 4.23 3.95 3.74 3.58 3.45 


40 .100 2.84 2.44 2.23 2.09 2.00 1.93 1.87 1.83 1.79 
.050 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 
.025 5.42 4.05 3.46 3.13 2.90 2.74 2.62 2.53 2.45 
.010 7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.89 
.005 8.83 6.07 4.98 4.37 3.99 3.71 3.51 3.35 3.22 


60 .100 2.79 2.39 2.18 2.04 1.95 1.87 1.82 1.77 1.74 
.050 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 
‚025 5.29 3.93 3.34 3.01 2.79 2.63 2:91 2.41 2.33 
010 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 
‚005 8.49 5:79 4.73 4.14 3.76 3.49 3.29 3.13 3.01 


120 .100 2.75 2.35 2.13 1.99 1.90 1.82 1.77 1.72 1.68 
‚050 3.92 3.07 2.68 2.45 2.29 2.17 2.09 2.02 1.96 
‚025 3:15 3.80 3:23 2.89 2.67 2.52 2.39 2.30 2.22 
010 6.85 4.79 3.95 3.48 3.17 2.96 2.79 2.66 2.56 
.005 8.18 5.54 4.50 3.92 3.55 3.28 3.09 2.93 2.81 


oo .100 2.71 2.30 2.08 1.94 1.85 1.77 1.72 1.67 1.63 
.050 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 
.025 5.02 3.69 3.12 2.79 2.57 2.41 2:29 2.19 2.11 
.010 6.63 4.61 3.78 3.32 3.02 2.80 2.64 2.51 241 
.005 7.88 5.30 4.28 3.72 3.35 3.09 2.90 2.74 2.62 


From “Tables of percentage points of the inverted beta (F) distribution.” Biometrika, Vol. 33 (1943) by M. Merrington and C. M. 
Thompson and from Table 18 of Biometrika Tables for Statisticians, Vol. 1, Cambridge University Press, 1954, edited by E. S. 
Pearson and H. O. Hartley. 
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Table 7 (Continued ) 
Fy 
Numerator df 

10 12 15 20 24 30 40 60 120 оо а ағ 
1.83 1.78 1.73 1.68 1.65 1.62 1.58 1.55 1.51 1.47 100 29 
2.18 210 2.03 1.94 1.90 1.85 1.81 1.75 1.70 1.64 .050 
2.53 2.43 2.32 221 2.15 2.09 2.03 1.96 1.89 1.81 .025 
3.00 2.87 2.73 2.57 249 241 233 223 214 203 010 
3.38 3.21 3.04 2.86 276 2.66 2.56 245 233 221  .005 

1.82 177 1.72 1.67 1.64 1.61 1.57 1.54 1.50 1.46  .100 30 
2.16 2.00 2.01 1.93 189 1.84 1.79 1.74 1.68 1.62  .050 
251 241 231 220 2.14 207 2.01 1.94 1.87 1.79 .025 
2.98 2.84 2.70 2.55 247 2.39 230 221 211 201  .010 
3.34 3.18 3.01 282 2.73 2.60 2.52 242 230 2.18  .005 

1.76 1.71 166 1.61 1.57 1.54 1.51 1.47 1.42 1.38  .100 40 
2.08 200 1.92 1.84 1.79 1.74 169 1.64 1.58 1.51 .050 
2.39 2.29 2.18 2.07 201 1.94 1.88 1.80 1.72 1.64 .025 
2.80 2.66 2.52 237 229 220 211 2.02 1.92 180 .010 
3.12 2.95 2.78 2.60 2.50 240 2.30 2.18 206 1.93 .005 

1.71 166 1.60 1.54 1.51 1.48 1.44 140 1.35 1.29 .100 60 
1.99 1.92 1.84 1.75 1.70 1.65 1.59 1.53 1.47 1.39 ..050 
227 2.7 206 1.94 188 1.82 1.74 1.67 1.58 148  .025 
2.60 2.50 235 220 212 203 1.94 1.84 1.73 1.60 .010 
2.90 2.74 2.57 2.39 229 219 2.08 1.96 1.83 1.69 .005 

165 160 1.55 148 145 141 1.37 1.32 1.26 1.19 .100 | 120 
1.9] 1.83 1.75 166 1.61 1.55 1.50 1.43 1.35 1.25  .050 
2.16 2.05 194 1.82 1.76 1.69 1.61 1.53 1.43 131  .025 
2.47 2.34 2.19 2.03 1.95 1.86 1.76 1.66 1.53 1.38  .010 
2.71 2.4 237 219 2.09 1.98 1.87 1.75 161 1.43  .005 

1.60 1.55 149 1.42 1.38 1.34 1.30 1.24 117 1.00 .100 | oo 
1.83 1.75 167 1.57 152 146 1.39 1.32 122 100 050 
205 1.94 1.83 1.71 1.64 1.57 1.48 1.39 127 1.00 .025 
2.32 2.18 2.04 1.88 179 1.70 1.59 1.47 1.32 1.00 .010 
2.52 2.36 2.19 2.00 1.90 1.79 1.67 1.53 1.36 1.00 .005 
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Tables 


Table 8 Distribution Function of U 


P(U < Ug); Оо is the 
argument; пу < n»; 


3 <n, < 10. 
п = 3 
nı 
Uo 1 2 3 
0 25 10 ‚05 
1 50 20 10 
2 40 .20 
3 .60 39 
4 50 
n, = 4 
ny 
Uo 1 2 3 4 
0 .2000 .0667  .0286  .0143 
1 .4000 1333 .0571  .0286 
2 .6000 .2667  .1143  .0571 
3 .4000 .2000  .1000 
4 .6000 .3143 1714 
5 4286 .2429 
6 5714 .3429 
7 4429 
8 5571 


Table 8 (Continued ) 


Ny = 5 
П] 

Uo 1 2 3 4 5 

0 .1667 .0476 .0179 .0079 .0040 

1 3333 0952 .0357 .0159 .0079 

2 .5000 1905 .0714 .0317 .0159 

3 .2857 .1250 .0556 .0278 

4 .4286 .1964 .0952 .0476 

5 5714 ‚28657 ‚1429 ‚0754 

6 3929 .2063 1111 

7 5000 ‚2778 1548 

8 3651 .2103 

9 4524 .2738 
10 .5476 .3452 
11 .4206 
12 .5000 

п =6 
ny 

Uo 1 2 3 4 5 6 
(0) 1429 .0357 .0119 .0048  .0022  .0011 
1 .2857 .0714 .0238 .0095 .0043  .0022 
2 4286 .1429 .0476 .0190 .0087  .0043 
3 .5714 .2143 .0833 .0333 .0152  .0076 
4 3214 1310 .0571  .0260  .0130 
5 .4286 .1905 .0857  .0411  .0206 
6 .5714  .2738  .1286  .0628 .0325 
7 .3571 .1762  .0887  .0465 
8 .4524 .2381  .1234 .0660 
9 .5476 | .3048  .1645 .0898 
10 3810 .2143  .1201 
11 .4571 | .2684  .1548 
12 .5429 .3312  .1970 
13 .3961 2424 
14 .4654 .2944 
15 5346 .3496 
16 .4091 
17 4686 
18 5314 
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Table 8 (Continued ) 


Uo 1 2 3 4 5 6 7 


0 1250 0278 .0083 .0030 .0013 .0006 .0003 
1 .2500 .0556 .0167 .0061 .0025 .0012 .0006 
2 .3750 111 .0333 .0121 .0051 .0023 .0012 
3 .5000 .1667 .0583 .0212 .0088 .0041 .0020 
4 .2500 .0917 .0364 .0152 .0070 .0035 
5 2333 .1333 .0545 .0240 0111 ‚0055 
6 4444 1917 0818 0366 0175 0087 
7 5556 .2583 ‚1152 ‚0530 ‚0256 0131 
8 3339 .1576 .0745 .0367 .0189 


9 .4167 .2061 .1010 .0507 .0265 
10 .5000 .2636 .1338 .0688 0364 
11 3242 1717 ‚0903 ‚0487 
12 3939 ‚2159 1171 ‚0641 
13 .4636 .2652 .1474 .0825 
14 .5364 3194 .1830 .1043 
15 3TT5 .2226 .1297 
16 .4381 .2669 .1588 
17 .5000 3141 1914 
18 3654 2279 
19 4178 .2675 
20 4726 .3100 
21 5274 :3592 
22 4024 
23 4508 


24 5000 


Table 8 (Continued ) 


Tables 


по = 8 
ny 
Uo 1 2 3 4 5 6 7 8 
O .1111  .0222 4.0061 .0020  .0008  .0003  .0002 .0001 
1 2222 0444  .0121  .0040 .0016  .0007  .0003 .0002 
2  .3333  .0889 0242 .0081  .0031  .0013 .0006 .0003 
3  .4444  .1333 .0424  .0141  .0054 .0023  .0011 .0005 
4 5556 .2000 .0667  .0242  .0093  .0040  .0019 .0009 
5 .2667 | .0970  .0364 .0148  .0063  .0030  .0015 
6 3556 1394  .0545  .0225  .0100 .0047 .0023 
7 .4444  .1879  .0768  .0326  .0147  .0070  .0035 
8 .5556 | 2485  .1071  .0466  .0213  .0103 .0052 
9 .3152 .1414 .0637  .0296  .0145 .0074 
10 .3879 .1838 .0855  .0406  .0200  .0103 
11 .4606 | .2303  .1111  .0539  .0270  .0141 
12 .5394 2848 1422  .0709  .0361 .0190 
13 3414  .1772 .0906  .0469 .0249 
14 .4040 .2176  .1142  .0603  .0325 
15 .4667  .2618  .1412  .0760 .0415 
16 .5333  .3108  .1725 .0946 .0524 
17 .3621  .2068  .1159  .0652 
18 .4165 | .2454  .1405 .0803 
19 .4716 | .2864  .1678 .0974 
20 5284  .3310 .1984 1172 
21 3113 | 2317 1393 
22 .4259 | 2679 1641 
23 4749  .3063 1911 
24 .5251  .3472 .2209 
25 3894 .2527 
26 .4333 .2869 
27 ATIS .3227 
28 .5225 .3605 
29 .3992 
30 .4392 
31 .4796 
32 .5204 
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Table 8 (Continued ) 


пә = 9 
ny 

Uo 1 2 3 4 5 6 7 8 9 
0 1000 .0182 .0045 .0014 .0005 .0002 .0001  .0000  .0000 
1 2000 .0364  .0091 .0028 .0010 .0004 .0002 .0001 .0000 
2 .3000 .0727  .0182 .0056 .0020 .0008 .0003 .0002 .0001 
3 4000 .1091 .0318 .0098 .0035 .0014 .0006 .0003 .0001 
4 5000 .1636 .0500 .0168 .0060 .0024 .0010 .0005 .0002 
5 2182 .0727  .0252  .0095 .0038 .0017 .0008 .0004 
6 29099 1045 .0378 .0145  .0060 .0026 .0012 .0006 
7 3636 .1409 .0531  .0210 .0088  .0039 .0019 .0009 
8 .4545 1864  .0741  .0300 .0128 .0058 .0028 .0014 
9 .5455 .2409 .0993  .0415  .0180 .0082  .0039 .0020 
10 3000 1301  .0559 .0248  .0115 .0056  .0028 
11 3636 .1650  .0734  .0332 .0156  .0076  .0039 
12 4318 .2070 .0949 .0440 .0209  .0103 .0053 
13 .5000 .2517  .1199 .0567 .0274  .0137 .0071 
14 .3021 1489  .0723  .0356 .0180  .0094 
15 .3552 .1818 .0905 .0454  .0232  .0122 
16 .4126 .2188  .1119 .0571 .0296  .0157 
17 4699 .2592  .1361  .0708  .0372  .0200 
18 .5301  .3032  .1638  .0869  .0464  .0252 
19 .3497  .1942  .1052  .0570  .0313 
20 3986  .2280 1261 .0694  .0385 
21 .4491  .2643 .1496 .0836 .0470 
22 5000 .3035 .1755  .0998 .0567 
23 .3445 .2039  .1179 .0680 
24 3878 .2349 .1383 .0807 
25 .4320 .2680  .1606 .0951 
26 4773 .3032 .1852 1112 
27 .5227  .3403 2117 .1290 
28 .3788 .2404  .1487 
29 4185 .2707  .1701 
30 4591 .3029  .1933 
31 5000 .3365 .2181 
32 3715 .2447 
33 4074  .2729 
34 4442 .3024 
35 4813  .3332 
36 .5187 .3652 
37 .3981 
38 .4317 
39 .4657 
40 .5000 
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Table 8 (Continued ) 
m = 10 
nı 

Uo 1 2 3 4 5 6 7 8 9 10 
О .0909 .0152 .0035 .0010 .0003 .0001 .0001 .0000 .0000  .0000 
1 1818 .0303 .0070 .0020 .0007  .0002 .0001 .0000 .0000 .0000 
2.2727 .0606 .0140 .0040 .0013 .0005 .0002 .0001 .0000 .0000 
3 .3636 .0909 .0245 .0070 .0023 .0009 .0004  .0002  .0001 .0000 
4 4545 .1364 .0385 .0120 .0040 .0015 .0006 .0003 .0001  .0001 
5 .5455 1818 .0559 .0180  .0063 .0024 .0010  .0004  .0002 .0001 
6 2424 .0804  .0270 .0097  .0037 .0015  .0007 .0003  .0002 
7 .3030 .1084  .0380 .0140 .0055 .0023 .0010  .0005  .0002 
8 .3788 | .1434  .0529 .0200 .0080 .0034  .0015 .0007 .0004 
9 .4545 .1853  .0709 .0276 .0112 .0048  .0022 .0011 .0005 
10 .5455 .2343  .0939 .0376  .0156 .0068  .0031  .0015  .0008 
11 2867 .1199 .0496 .0210  .0093 .0043 .0021  .0010 
12 3462 1518 .0646  .0280  .0125 .0058 .0028 .0014 
13 4056 1868  .0823  .0363  .0165  .0078  .0038 .0019 
14 .4685 .2268 .1032 .0467 .0215  .0103 .0051 .0026 
15 .5315  .2697  .1272  .0589  .0277  .0133  .0066  .0034 
16 .3177 .1548  .0736 .0351  .0171  .0086 .0045 
17 .3666 | .1855  .0903 .0439  .0217  .0110  .0057 
18 4196 .2198 .1099 .0544  .0273 .0140  .0073 
19 .4725  .2567  .1317  .0665 .0338  .0175  .0093 
20 .5275  .2970 .1566 4.0806 4.0416 4.0217  .0116 
21 .3393 .1838 .0966  .0506 .0267  .0144 
22 .3839  .2139  .1148  .0610  .0326  .0177 
23 .4296 .2461 .1349 .0729 .0394 .0216 
24 .4765  .2811  .1574  .0864 .0474  .0262 
23 .5235 .3177  .1819 .1015  .0564  .0315 
26 .3564  .2087  .1185  .0667  .0376 
27 3962 .2374 371  .0782 .0446 
28 .4374 .2681  .1577  .0912  .0526 
29 .4789 .3004  .1800  .1055  .0615 
30 .5211 .3345 22041 .1214  .0716 
3l 3698 .2299  .1388  .0827 
32 4063  .2574  .1577  .0952 
33 .4434 = .2863  .1781  .1088 
34 .4811  .3167  .2001  .1237 
35 5189 .3482  .2235  .1399 
36 3809  .2483  .1575 
37 .4143  .2745  .1763 
38 .4484 .3019  .1965 
39 .4827  .3304  .2179 
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Table 8 (Continued ) 


п = 10 
ny 
Uo 1 2 4 5 6 7 8 9 10 
40 .5173  .3598  .2406 
41 .3901  .2644 
42 .4211]  .2894 
43 4524 .3153 
44 .4841 .3421 
45 .5159  .3697 
46 .3980 
47 .4267 
48 .4559 
49 .4853 
50 5147 
Computed by M. Pagano, Department of Statistics, University of Florida. 
Table 9 Critical Values of Tin the Wilcoxon Matched-Pairs, Signed-Ranks Test; n = 5(1)50 
One-sided Two-sided n=5 n=6 ne n=8 n=9 n=10 
P=.05 P=.10 1 2 4 6 8 11 
Р = .025 Р = .05 1 2 4 6 8 
P= 01 Р = .02 0 2 3 5 
Р = .005 Р = .01 0 2 3 
One-sided Two-sided n=11 n=12 п= 13 n=14 n=15 n=16 
P=.05 P= 10 14 17 21 26 30 36 
Р = {025 Р = .05 11 14 17 21 25 30 
Р = .01 Р = .02 7 10 13 16 20 24 
Р = .005 Р = .01 5 7 10 13 16 19 
One-sided Two-sided и = 17 п= 18 п= 19 п= 20 п=21 п=22 
Р = .05 Р = 10 41 47 54 60 68 75 
Р = .025 Р = .05 35 40 46 52 59 66 
Р = .01 P= 02. 28 33 38 43 49 56 
Р = 005 Р = .01 23 28 32 37 43 49 
One-sided Two-sided n = 23 п= 24 п= 25 п= 26 п= 27 п= 28 
Р = .05 Р =.10 83 92 101 110 120 130 
Р = .025 Р = .05 73 81 90 98 107 117 
Р = .01 Р = .02 62 69 77 85 93 102 
Р = .005 Р = .01 55 68 68 76 84 92 


Table 9 (Continued ) 
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One-sided Two-sided n=29 n=30 n=31 п= 32 п= 33 n=34 
P=.05 P=.10 141 152 163 175 188 201 
P= .025 Р`==.05 127 137 148 159 171 183 
P=.01 P = 02 111 120 130 141 151 162 
Р = .005 P= О] 100 109 118 128 138 149 
One-sided Two-sided n=35 n=36 п=37 п= 38 n=39 

P=.05 P= 10 214 228 242 256 271 

Р:== 025 P= 05 195 208 222 235 250 

Р = 101 Р = .02 174 186 198 211 224 

Р = .005 Р = .01 160 171 183 195 208 

One-sided Two-sided n=40 п=41 п= 42 п= 43 п= 44 п = 45 
Р= 205 Р==ДО 287 303 319 336 353 371 
Р = .025 Р-=={05 264 279 295 311 327 344 
P=.01 Р = .02 238 252 267 281 297 313 
Р = .005 Р = .01 221 234 248 262 277 292 
One-sided Two-sided лп = 46 п= 47 п=48 п= 49 п = 50 

Р 205 Р = .10 389 408 427 446 466 

Р = .025 P205 361 379 397 415 434 

P=.01 P= 02 329 345 362 380 398 

Р = 005 P= ОЛ 307 323 339 356 373 


From “Some Rapid Approximate Statistical Procedures" (1964), 28, Е. Wilcoxon апа К. A. Wilcox. 
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Table 10 Distribution of the Total Number of Runs А in Samples of Size (m, nz); P(R < a) 


(nı, n2) 2 3 4 5 6 T 8 9 10 
(2, 3) 200 500 .900 1.000 

(2, 4) 133 400 .800 1.000 

(2, 5) 095 .333 .714 1.000 

(2, 6) ‚071 .286 .643 1.000 

(2,7) .056 250 .583 1.000 

(2, 8) 044  .222  .533 1.000 

(2,9) .036 .200 .491 1.000 

(2, 10) .030 .182 .455 1.000 

(3,3) 100.300 .700 .900 1.000 

(3,4) .057  .200  .543 .800 .971 1.000 

(3, 5) .036 .143 429 714 929 1.000 

(3, 6) 024 107 .345 .643 .881 1.000 

(3, 7) .017  .083 2283 .583 .833 1.000 

(3, 8) 012  .067 .236 i333 .788 1.000 

(3,9) .009  .055 .200 .491 745 1.000 

(3, 10) 007 .045 171 455 706 1.000 

(4, 4) .029 114  .371 .629 .886 ‚971 1.000 

(4, 5) 016  .071 .262 .500 786 929 ‚992. 1.000 

(4, 6) 010 048 .190 .405 .690 .881 .976 1.000 

(4, 7) .006 .033  .142 .333 .606 .833 .954 1.000 

(4, 8) .004 .024 .109 .279 533 .788 .929 1.000 

(4, 9) .003  .018 .085 .236 471 745 ‚902. 1.000 

(4, 10) .002  .014 .068 ‚203 419 ‚706 .874 1.000 

(5,5) .008 .040 .167 357 .643 .833 .960 .992 1.000 
(5, 6) 004 .024  .110 .262 {522 738 911 ‚976 998 
(5,7) 003 .015 .076 .197 .424 .652 .854 .955 .992 
(5, 8) .002  .010  .054 152 347 576 .793 ‚929 984 
(5,9) ‚001 .007 _.039 119 .287 510 734 ‚902. 972 
(5, 10) 001 .005 .029 ‚095 .239 .455 .678 .874 .958 
(6, 6) .002  .013 .067 .175 .392 .608 .825 .933 .987 
(6, 7) .001 .008 .043 .121 .296 .500 ‚733 879 966 
(6, 8) ‚001 .005 .028 ‚086 ‚226 413 .646 .821 .937 
(6, 9) .000 .003  .019 .063 145 .343 .566 ‚762. 902 
(6, 10) .000 = .002  .013 .047 .137 .288 .497 ‚706 ‚864 
(7,7) 001 .004 .025 ‚078 .209 .383 .617 ‚791 922 
(7,8) .000 .002 ~~ .015 ‚051 ‚149 ‚296 514 .704 .867 
(7,9) .000 .001 .010 .035 .108 .231 427 .622 .806 
(7, 10) .000 .001 .006 .024 .080 .182 .355 .549 .743 
(8, 8) .000 .001 .009 .032 .100 214 405 595 786 
(8,9) .000 .001 005 ‚020 069 .157 .319 .500 -702 
(8, 10) .000 .000 .003 .013 .048 117 251 419 621 
(9, 9) .000  .000 .003 ‚012 044 ‚109 .238 .399 .601 
(9, 10) .000 .000  .002 .008 .029 .077 .179 .319 .510 
(10, 10) .000 .000 .001 .004 .019 .051 .128 ‚242. 414 
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Table 10 (Continued ) 


(n1, n2) 11 12 13 14 15 16 17 18 19 20 


0,3) 
0,4) 

0,5) 

(2,6) 

(2,7) 

(2,8) 

(2,9) 

(2, 10) 

(3,3) 

(3,4) 

G. 5) 

G. 6) 

(3,7) 

(3, 8) 

(3,9) 

(3, 10) 

(4,4) 

(4,5) 

(4,6) 

(4,7) 

(4, 8) 

(4, 9) 

(4, 10) 

(5, 5) 

(5, 6) 1.000 
(5, 7) 1.000 
(5, 8) 1.000 
(5, 9) 1.000 
(5,10) 1.000 


(6, 6) .998 1.000 

(6, 7) .992 .999 1.000 
(6, 8) .984 .998 1.000 
(6, 9) .972 .994 1.000 
(6, 10) .958 .990 1.000 


CT) .975 | .996 .999 1.000 

(7, 8) .949  .988 .998 1.000 1.000 
(7,9) 916 975 994 .999 1.000 
(7, 10) 879 | .957  .990  .998 1.000 


(8, 8) .900  .968 .991  .999 1.000 1.000 
(8, 9) .843 .939  .980  .996  .999 1.000 1.000 
(8, 10) 782  .903 964  .990  .998 1.000 1.000 


(9, 9) .762 .891  .956  .988  .997 1.000 1.000 1.000 
(9, 10) 681 .834 923  .974  .992 .999 1.000 1.000 1.000 
(10, 10)  .586  .758  .872  .949 981  .996 .999 1.000 1.000 1.000 


From “Tables for Testing Randomness of Grouping in a Sequence of Alternatives,’ C. Eisenhart and 
F. Swed, Annals of Mathematical Statistics, Volume 14 (1943). 
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Table 11 Critical Values of Spearman's Rank Correlation 
Coefficient 


п а= .05 а= .025 a=.01 a=.005 


5 0.900 — = — 
б 0.829 0.886 0.943 — 
7 0.714 0.786 0.893 — 
8 0.643 0.738 0.833 0.881 
9 0.600 0.683 0.783 0.833 
10 0.564 0.648 0.745 0.794 


11 0.523 0.623 0.736 0.818 
12 0.497 0.591 0.703 0.780 
13 0.475 0.566 0.673 0.745 
14 0.457 0.545 0.646 0.716 
15 0.441 0.525 0.623 0.689 


16 0.425 0.507 0.601 0.666 
17 0.412 0.490 0.582 0.645 
18 0.399 0.476 0.564 0.625 
19 0.388 0.462 0.549 0.608 
20 0.377 0.450 0.534 0.591 


2] 0.368 0.438 0.521 0.576 
22 0.359 0.428 0.508 0.562 
23 0.351 0.418 0.496 0.549 
24 0.343 0.409 0.485 0.537 
25 0.336 0.400 0.475 0.526 


26 0.329 0.392 0.465 0.515 
27 0.323 0.385 0.456 0.505 
28 0.317 0.377 0.448 0.496 
29 0.311 0.370 0.440 0.487 
30 0.305 0.364 0.432 0.478 


From “Distribution of Sums of Squares of Rank Differ- 
ences for Small Samples,” E. G. Olds, Annals of Mathe- 
matical Statistics, Volume 9 (1938). 


Table 12 Random Numbers 


Line/Col. (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) 
1 10480 15011 01536 02011 81647 91646 69179 14194 62590 36207 20969 99570 91291 90700 
2 22368 46573 25595 85393 30995 89198 27982 53402 93965 34095 52666 19174 39615 99505 
3 24130 48360 22527 97265 76393 64809 15179 24830 49340 32081 30680 19655 63348 58629 
4 42167 93003 06243 61680 07856 16376 39440 53537 71341 57004 00849 74917 97758 16379 
5 37570 39975 81837 16656 06121 91782 60468 81305 49684 60672 14110 06927 01263 54613 
6 77921 06907 11008 42751 27756 53498 18602 70659 90655 15053 21916 81825 44394 42880 
7 99562 72095 56420 69994 98872 31016 71194 18738 44013 48840 63213 21069 10634 12952 
8 96301 91977 05463 07972 18876 20922 94595 56869 69014 60045 18425 84903 42508 32307 
9 89579 14342 63661 10281 17453 18103 57740 84378 25331 12566 58678 44947 05585 56941 

10 85475 36857 53342 53988 53060 59533 38867 62300 08158 17983 16439 11458 18593 64952 
11 28918 69578 88231 33276 70997 79936 56865 05859 90106 31595 01547 85590 91610 78188 
12 63553 40961 48235 03427 49626 69445 18663 72695 52180 20847 12234 90511 33703 90322 
13 09429 93969 52636 92737 88974 33488 36320 17617 30015 08272 84115 27156 30613 74952 
14 10365 61129 87529 85689 48237 52267 67689 93394 01511 26358 85104 20285 29975 89868 
15 07119 97336 71048 08178 77233 13916 47564 81056 97735 85977 29372 74461 28551 90707 
16 51085 12765 51821 51259 77452 16308 60756 92144 49442 53900 70960 63990 75601 40719 
17 02368 21382 52404 60268 89368 19885 55322 44819 01188 65255 64835 44919 05944 55157 
18 01011 54092 33362 94904 31273 04146 18594 29852 71585 85030 51132 01915 92747 64951 
19 52162 53916 46369 58586 23216 14513 83149 98736 23495 64350 94738 17752 35156 35749 
20 07056 97628 33787 09998 42698 06691 76988 13602 51851 46104 88916 19509 25625 58104 
21 48663 91245 85828 14346 09172 30168 90229 04734 59193 22178 30421 61666 99904 32812 
22 54164 58492 22421 74103 47070 25306 76468 26384 58151 06646 21524 15227 96909 44592 
23 32639 32363 05597 24200 13363 38005 94342 28728 35806 06912 17012 64161 18296 22851 
24 29334 27001 87637 87308 58731 00256 45834 15398 46557 41135 10367 07684 36188 18510 
25 02488 33062 28834 07351 19731 92420 60952 61280 50001 67658 32586 86679 50720 94953 


səjqeL 
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Table 12 (Continued ) 


Line/Col. (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) 
26 81525 72295 04839 96423 24878 82651 66566 14778 76797 14780 13300 87074 79666 95725 
27 29676 20591 680866 26432 46901 20849 89768 81536 86645 12659 92259 57102 80428 25280 
28 00742 57392 39064 66432 84673 40027 32832 61362 98947 96067 64760 64584 96096 98253 
29 05366 04213 25669 26422 44407 44048 37937 63904 45766 66134 75470 66520 34693 90449 
30 91921 26418 64117 94305 26766 25940 39972 22209 71500 64568 91402 42416 07844 69618 
31 00582 04711 87917 77341 42206 35126 74087 99547 81817 42607 43808 76655 62028 76630 
32, 00725 69884 62797 56170 86324 88072 76222 36086 84637 93161 76038 65855 77919 88006 
33 69011 65795 95876 55293 18988 27354 26575 08625 40801 59920 29841 80150 12777 48501 
34 25976 57948 29888 88604 67917 48708 18912 82271 65424 69774 33611 54262 85963 03547 
35 09763 83473 73577 12908 30883 18317 28290 35797 05998 41688 34952 37888 38917 88050 
36 91567 42595 27958 30134 04024 86385 29880 99730 55536 84855 29080 09250 79656 73211 
37 17955 56349 90999 49127 20044 59931 06115 20542 18059 02008 73708 83517 36103 142791 
38 46503 18584 18845 49618 02304 51038 20655 58727 28168 15475 56942 53389 20562 87338 
39 92157 89634 94824 78171 84610 82834 09922 25417 44137 48413 25555 21246 35509 20468 
40 14577 62765 35605 81263 39667 47358 56873 56307 61607 49518 89656 20103 77490 18062 
41 98427 07523 33362 64270 01638 92477 66969 98420 04880 45585 46565 04102 46880 45709 
42 34914 63976 88720 82765 34476 17032 87589 40836 32427 70002 70663 88863 77775 69348 
43 70060 28277 39475 46473 23219 53416 94970 25832 69975 94884 19661 72828 00102 66794 
44 53976 54914 06990 67245 68350 82948 11398 42878 80287 88267 47363 46634 06541 97809 
45 76072 29515 40980 07391 58745 25774 22987 80059 39911 96189 41151 14222 60697 59583 
46 90725 52210 83974 29992 65831 38857 50490 83765 55657 14361 31720 57375 56228 41546 
47 64364 67412 33339 31926 14883 24413 59744 92351 97473 89286 35931 04110 23726 51900 
48 08962 00358 31662 25388 61642 34072 81249 35648 56891 69352 48373 45578 78547 81788 
49 95012 68379 93526 70765 10592 04542 76463 54328 02349 17247 28865 14777 62730 92277 
50 15664 10493 20492 38391 91132 21999 59516 81652 27195 48223 46751 22923 32261 85653 
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Table 12 (Continued ) 


Line/Col. (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) 
51 16408 81899 04153 53381 79401 21438 83035 92350 36693 31238 59649 91754 72772 02338 
22 18629 81953 05520 91962 04739 13092 97662 24822 94730 06496 35090 04822 86774 98289 
53 73115 35101 47498 87637 99016 71060 88824 71013 18735 20286 23153 72924 35165 43040 
54 57491 16703 23167 49323 45021 33132 12544 41035 80780 45393 44812 12515 98931 91202 
22 30405 83946 23792 14422 15059 45799 22716 19792 09983 74353 68668 30429 70735 25499 
56 16631 35006 85900 98275 32388 52390 16815 69298 82732 38480 73817 32523 41961 44437 
57 96773 20206 42559 78985 05300 22164 24369 54224 35083 19687 11052 91491 60383 19746 
58 38935 64202 14349 82674 66523 44133 00697 35552 35970 19124 63318 29686 03387 59846 
59 31624 76384 17403 53363 44167 64486 64758 75366 76554 31601 12614 33072 60332 92325 
60 78919 19474 23632 27889 47914 02584 37680 20801 72152 39339 34806 08930 85001 87820 
61 03931 33309 57047 74211 63445 17361 62825 39908 05607 91284 68833 25570 38818 46920 
62 74426 33278 43972 10119 89917 15665 52872 73823 73144 88662 88970 74492 51805 99378 
63 09066 00903 20795 95452 92648 45454 09552 88815 16553 51125 79375 97596 16296 66092 
64 42238 12426 87025 14267 20979 04508 64535 31355 86064 29472 47689 05974 52468 16834 
65 16153 08002 26504 41744 81959 65642 74240 56302 00033 67107 77510 70625 28725 34191 
66 21457 40742 29820 96783 29400 21840 15035 34537 33310 06116 95240 15957 16572 06004 
67 21581 57802 02050 89728 17937 37621 47075 42080 97403 48626 68995 43805 33386 21597 
68 55612 78095 83197 33732 05810 24813 86902 60397 16489 03264 88525 42786 05269 92532 
69 44657 66999 99324 51281 84463 60563 79312 93454 68876 25471 93911 25650 12682 73572 
70 91340 84979 46949 81973 37949 61023 43997 15263 80644 43942 89203 71795 99533 50501 
71 91227 21199 31935 27022 84067 05462 35216 14486 29891 68607 41867 14951 91696 85065 
72 50001 38140 66321 19924 72163 09538 12151 06878 91903 18749 34405 56087 82790 70925 
73 65390 05224 72958 28609 81406 39147 25549 48542 42627 45233 57202 94617 23772 07896 
74 27504 96131 83944 41575 10573 08619 64482 73923 36152 05184 94142 25299 84387 34925 
75 37169 94851 39117 89632 00959 16487 65536 49071 39782 17095 02330 74301 00275 48280 
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Table 12 (Continued ) 


Line/Col. 


76 
77 
78 
79 
80 


81 
82 
83 
84 
85 


86 
87 
88 
89 
90 


91 
92 
93 
94 
95 


96 
97 
98 
99 
100 


(1) 


11508 
37449 
46515 
30986 
63798 


82486 
21885 
60336 
43937 
97656 


03299 
79626 
85636 
18039 
08362 


79556 
92608 
23982 
09915 
59037 


42488 
46764 
03237 
86591 
38534 


(2) 


70225 
30362 
70331 
81223 
64995 


84846 
32906 
98782 
46891 
63175 


01221 
06486 
68335 
14367 
15656 


29068 
82674 
25835 
96306 
33300 


78077 
86273 
45430 
81482 
01715 


(3) 


51111 
06694 
85922 
42416 
46583 


99254 
92431 
07408 
24010 
89303 


05418 
03574 
47539 
61337 
60627 


04142 
27072 
40055 
05908 
26695 


69882 
63003 
55417 
52667 
94964 


(4) 


38351 
54690 
38329 
58353 
09785 


67632 
09060 
53458 
25560 
16275 


38982 
17668 
03129 
06177 
36478 


16268 
32534 
67006 
97901 
62247 


61657 
93017 
63282 
61582 
87288 


(5) 


19444 
04052 
57015 
21532 
44160 


43218 
64297 
13564 
86355 
07100 


55758 
07785 
65651 
12143 
65648 


15387 
17075 
12293 
28395 
69927 


34136 
31204 
90816 
14972 
65680 


(6) 


66499 
53115 
15765 
30502 
78128 


50076 
51674 
59089 
33941 
92063 


92237 
76020 
11977 
46609 
16764 


12856 
27698 
02753 
14186 
76123 


79180 
36692 
17349 
90053 
43772 


(7) 


71945 
62757 
97161 
32305 
83991 


21361 
64126 
26445 
25786 
21942 


26759 
79924 
02510 
32989 
53412 


66227 
98204 
14827 
00821 
50842 


97526 
40202 
88298 
89534 
39560 


(8) 


05422 
95348 
17869 
86482 
42885 


64816 
62570 
29789 
54990 
18611 


86367 
25651 
26113 
74014 
09013 


38358 
63863 
23235 
80703 
43834 


43092 
35215 
90183 
76036 
12918 


(9) 


13442 
78662 
45349 
05174 
92520 


51202 
26123 
85205 
71899 
47348 


21216 
83325 
99447 
64708 
07832 


22478 
11951 
35071 
70426 
86654 


04098 
57306 
36600 
49199 
86537 


(10) 


78675 
11163 
61796 
07901 
83531 


88124 
05155 
41001 
15475 
20203 


98442 
88428 
68645 
00533 
41574 


73373 
34648 
99704 
75647 
70959 


73571 
55543 
78406 
43716 
62738 


(11) 


84081 
81651 
66345 
54339 
80377 


41870 
59194 
12535 
95434 
18534 


08303 
85076 
34327 
35398 
17639 


88732 
88022 
37543 
76310 
79725 


80799 
53203 
06216 
97548 
19636 


(12) 


66938 
50245 
81073 
58861 
35909 


52689 
527799 
12133 
98227 
03862 


56613 
72811 
15152 
58408 
82163 


09443 
56148 
11601 
88717 
93872. 


76536 
18098 
95787 
04379 
51132 


(13) 


93654 
34971 
49106 
74818 
81250 


51275 
28225 
14645 
21824 
78095 


91511 
22717 
55230 
13261 
60859 


82558 
34925 
35503 
37890 
28117 


71255 
47625 
42579 
46370 
25739 


Abridged from Handbook of Tables for Probability and Statistics, 2nd edition, edited by William Н. Beyer (Cleveland: The Chemical Rubber Company, 1968). 


(14) 


59894 
52924 
79860 
46942 
54238 


83556 
85762 
23541 
19585 
50136 


75928 
50585 
93448 
47908 
75567 


05250 
57031 
85171 
40129 
19233 


64239 
88684 
90730 
28672 
56947 
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ANSWERS 


1.5 


1.9 


2.7 


2.9 


2.13 


Chapter 1 

а 2.45 — 2.65, 2.65 — 2.85 

b 7/30 

с 16/30 

а Арргох. .68 

b Approx. .95 

С Approx. .815 

d Approx. 0 

а y=9.79;5 = 4.14 

b k= 1: (5.65, 13.93); k = 2: (1.51, 
18.07); К = 3: (—2.63, 22.21) 

а y=4.39; s = 1.87 

b k= 1: (2.52, 6/26); К = 2: (0.65, 
8.13); К = 3: (—1.22, 10) 

For Ex. 1.2, range/4 = 7.35; s = 4.14; 

for Ex. 1.3, range/4 = 3.04; s = 3.17; 

for Ex. 1.4, range/4 = 2.32, s = 1.87. 

y-s=-19<0 

Chapter 2 

A = {two males} = ((Mi, М»), 

(M,,M3), (M2,M3)} 

В = {at least опе female} = {(M1, W1), 

(M2,W1), (M3,W1), (M1,W2), (М5, №), 

(M3,W2), (W1, W2)} 

B = {no females} = A; A U B = S; 

АПВ = п; АПВ = А 

S = {A*, Bt, АВ, О+, А>, B7, 

АВ, 07} 

а Р(Е;) = .10; P(E4) = .20 

b р= 2 

а Е, = very likely (VL); E; = 


somewhat likely (SL); Ез = 
unlikely (U); Е = other (О) 

b No; P(VL) = .24, P(SL) = .24, 
P(U) = .40, P(O) = .12 

с 48 


тид m m o m 


.21 
.23 


.25 


.27 
.29 
.31 
.33 
.35 


2.15 


2.17 


2.19 


2.27 


84 
a 16% 

b Approx. 95% 

a 177 

с j—210.8; s = 162.17 

d k= 1: (48.6, 373); k = 2: 


(—113.5, 535.1); k = 3: (—275.7, 
697.3) 
68% or 231 scores; 95% or 323 scores 
.05 
.025 
(0.5, 10.5) 
а (172 – 108)/4 = 16 
р ӯ = 136.1; = 17.1 
с а= 136.1 — 2(17.1) = 101.9; 
b = 136.1 + 2(17.1) = 170.3 


.09 

.19 

.08 

.16 

14 

84 

(Vi, Vi), (Vi, V2), (М, V3), 

(V2, Vi), (V2, V2), (V2, Уз), 

(V3, Vi), (V3, V2), (Уз, V3) 

b If equally likely, all have 
probability of 1/9. 

С P(A) = 1/3; P(B) = 5/9; 
P(AU В) = 7/9; 
Р(АП В) = 1/9 

а 5 = (CC, CR, CL, RC, RR, RL, 
LC, LR, LL} 

b 5/9 

c 5/9 


ъс cm on 


877 


878 Answers 


2.53 


2.55 


2.57 


2.59 


2.61 


2.63 
2.65 
2.67 


2.71 


2.73 


c 1/15 

а 3/5; 1/15 

b 14/15; 2/5 

C 11/16; 3/8; 1/4 


18,252 

8515 required 
Yes 

4/19,600 
276/19,600 
4140/19,600 
15180/19,600 
60 sample points 
36/60 = .6 


(0) 
(2°) (8) Co) =" 


(4 х 12)/1326 = .0362 
а .000394 
b .00355 
364" 
365" 
b .5005 
1/56 
5/162 
P(A) = .0605 
.001344 
.00029 
1/3 
1/5 


соо о со 2078 


£ 


= 


Ta „ш аз осо ооо сро TD 
aH 
Qu ot ы 
EN 


c 
d 

2.97 a .999 
b 


2.101 .05 
2.103 a .001 


2.105 .90 

2.109 P(A) > .9833 

2.111 .149 

2.113 (.98)3(.02) 

2.415 (755 

2.117 a 4(.5)* = 25 
(.5)# = 1/16 

2.119 a 1/4 


2.121 


2.125 1/12 
2.127 a .857 


2.129 4 
2.133 .9412 
2.135 


2.137 
b 3/20 
2.139 P(Y = 0) = (02); 
P(Y = 1) = 3(.02)2(.98); 
P(Y = 2) = 3(.02)(.98)2; 
P(Y = 3) = (.98)3 
2.141 P(Y = 2) = 1/15; Р(Ү = 3) = 2/15; 
Р(Ү = 4) = 3/15; Р(Ү = 5) = 4/15; 
Р(Ү = 6) = 5/15 
2.145 18! 
2.147 .0083 
2.149 a 4 
b 6 
c 25 
2.151 A4[p*(1— p) + p(1 — p)*] 
2.153 .313 
2.155 a 5 
b 15 
c 10 
d .875 


2.157 
2.161 
2.163 


2.165 
2.167 


2.169 


3.1 


3.3 


3.5 


3.7 


3.9 


3.11 


3.13 


3.15 


021 
P(R < 3) = 12/66 
P(A) = 0.9801 
P(B) = .9639 
916 


P(Y = 1) = 35/70 = .5; 

P(Y = 2) = 20/10 = 2/7; 
P(Y = 3) = 10/70; 

P(Y = 4) = 4/70; Р(Ү = 5) = 
a (4)? = 13,824 


Chapter 3 
PY = 0) =.2, PY = 1) = 17, 
Р(Ү = 2) = 1 
а oO 
PQ) = =,р() e PA = > 
0 = 2 ma- eyes 
P = 56:8 EX 
3! 6 
р(0) = 21 pP x ad 
dixi иу. 
a P(Y = 3) = .000125, 


Р(Ү = 2) = .007125, 
Р(Ү = 1) = .135375, 
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b 3456/13,824 = .25 
2.173 .25 
2.177 a .364 
b .636 
С (49/50)" > .60, son is at most 25 


6 
2.179 a 20(3) = 3125 


1 10 
1/70 b (5) 
Р(Ү = 2) = .3894, 
Р(Ү = 3) = .1406 


с P(Y = 1) = .3594 

d u= E(Y) = 1.56, о? = .7488, 
о = 0.8653 

е (—.1706, 3.2906), 
PO<Y<3)=1 

3.17 w= E(Y) = .889, 

c? 2 V(Y) = Е(Ү?)—[Е(Ү)]? = .321, 

о = 0.567, (и — 2o, 

u + 20) = (—.245, 2.023), 

PO<Y<2)=1 


Р(Ү = 0) = .857375 тсе 
c P(Y > 1) = .00725 3.21 13,800.388 
12 1 : 
PZ=0= 2, Te =1у= É, 3.23 $31 
A A 3.25 Firm I: E (profit) = $60,000 
Р(Х =2)= 7 Р(Х =3) = 55 E(total profit) = $120,000 
aie 2744 3.27 $510 
Pos 3375" 3.35 .4; .3999 
Виет, 3.39 a .1536; 
s b .9728 
d ici iE 3375 3.41 .000 
3.43 a .1681 
P(Y = 3) = —Z2-2X-Y, 
\ = sys И b .5282 
beue pede 3.45 P(alarm functi 0.992 
io 105° = 1) = 125" чу (alarm functions) = 0. 
49 a .151 
( ) 125” ‹ 3) 125 b .302 
7 27 
EY) = -, EY) = =, VY) = =, 3.51 а .51775 
$* 4 16 b 4914 
созд 3.53 a 0156 
а P(Y =0) = .1106, b .4219 
Р(Ү = 1) = .3594, с 25% 


880 Answers 


3.57 
3.59 
3.61 


3.67 
3.69 
3.73 


3.75 


3.81 
3.83 


3.87 


3.91 
3.93 


3.95 
3.97 


3.99 


3.101 


3.103 
3.105 


3.107 


3.109 


3.111 


$185,000 
$840 


a .672 
b .672 
c 8 


.07203 
Y is geometric with p — .59 


a .009 
b Ol 


a .081 

b .81 

2 

1 /n—1Ny 

n n 

g( lj. 2180) 
Y 1= р 

$150; 4500 


а .04374 
b .99144 


у! r yyti-r 
r-D-rx pi^? 
y=r-—l,r,r+1,... 


р(х) = 


42 

b 7143 

c u= 1.875, 
с = .7087 


hypergeometric with N = 6, n = 2, 


andr = 4. 
а .0238 
b .9762 
с .9762 
ul st 
БЕТА apr 
222— 
pQ) 30 
5 15 
b р(0) = =, р(1) = =, 
pO) 30 pa T 


3.113 
3.115 
3.117 


3.119 
3.121 


3.123 
3.125 
3.127 
3.129 
3.131 
3.133 
3.135 
3.137 
3.139 
3.141 
3.149 


' 3.151 


3.153 


3.155 


3.167 


3.169 


3.171 
3.173 


р(2) = Z р(3) = = 

30 30 
P(Y < 1) =.187 

1 3 1 
р(0) = s p(l) = s р(2) = 5 


а P(Y = 0) = .553 

b Е(Т) = 9.5, V(T) = 28.755, 
о = 5.362 

.016 


a .090 
b .143 
c .857 
d 241 


.1839 

E(S) = 7, V(S) = 700; no 
.6288 

23 seconds 

5578 

1745 

9524 

1512 

40 

$1300 

Binomial, n = 3 and p = .6 


Binomial, п = 10 and p = .7, 
P(Y < 5) = .1503 
а Binomial, п = 5 and p = .1 
р Geometric, р = 5 
с Poisson, A = 2 

7 
a E(Y)- 


b V(Y)= 


“ol лоо! 


c р(1 = (2) = 2 aes 
p = 6 р = 6? P T 6 
.64 
C —10 
d р(-1) = 1/0202), 
p(0) = 1 — (1/2), p(1) = 102) 
(85, 115) 
а р(0) = 


cm 


3 3 
p)- gz rO = 


o| —oo0| = 


р(3) = 


c E(Y)2 L5, V(Y) =.75, 
с = .866 


3.175 а 384 
b 5.11 


3.177 (61.03, 98.97) 

1 
3.179 No, P(Y > 350) < 989 = .1126. 
3.181 


р = Fraction defective P(acceptance) 
0 1 
10 5905 
30 ‚1681 
50 ‚0312 
1.0 0 


опо = ы 


3.185 2277 


Not unlikely 


.023 
1.2 
$1.25 


3.189 1 — (.99999)10.000 
3.191 v(y)— 4 
3.193 .476 


3.187 


ona of 


a 


Chapter 4 


4.7 a P2xY <5)=0.591, 

P(2 < Y < 5) = .289, so 
not equal 

b P(2<Y <5)=0.618, 
P(2 < Y < 5) = 0.316, so 
not equal 

С Yisnota continuous random 
variable, so the earlier results 
do not hold. 


4.9 a Y isa discrete random variable 
b These values are 2, 2.5, 4, 5.5, 6, 
and 7. 
c Q-1 Q.5) — І 
diu MEET. 
(4) — 2 (5.5) = : 
p E 16°” sc = 8” 
(6) = : (7) = 2 
р 16°? — 16 
ds s 
4.11 a c=- 
2 
у? 
b Fore 402y22 


3.195 


3.197 


3.199 


3.201 
3.203 


3.205 
3.207 


3.209 
3.211 
3.213 


3.215 


4.13 


4.15 


4.17 


a 
b 
a 
b 
a 
b 
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.982 
P(W-1)21-e? 
.9997 

п=2 


300 
‚037 


(18.35, 181.65) 


a E[Y(t)] = k(e?* — e") 

b 3.2974, 2.139 

.00722 

a р(2) = .084, P(Y < 2) = .125 

b P(Y > 10) = .014 

.0837 

3 

a .1192 

b 117 

a n[l 4 k(1— .955)] 

р g(k)is minimized at k = 5 and 
g(5) = 4262. 

с .5738N 


75 
75 
0 у< 0 
2 
X 
= O<y<l 
F=} 2 | 
—— 1 < 1.5 
peus ауе 
1 у> 1.5 
125 
S75 


For b > 0, f(y) = 0; also, 
ffoysa 


AER А 
Е(у) = 1 — –, for y > b; 


0 elsewhere. 
b 


(b+c) 
(b+c) 


882 Answers 


4.19 


4.21 
4.25 
4.27 


4.29 


4.31 
4.33 


4.37 
4.39 


4.45 


4.47 


4.49 


4.51 
4.53 


4.55 


4.59 


d F(-D20,F(0 = 0, F(D =1 


M 
16 
104 
pl 
123 
0 y<0 
_) 25 O<y<2 
а fo) 125у 2<y<4 
0 y>4 
7 
Re 
16 
13 
с SRM 
16 
7 
4:7 
9 
E(Y) = .708, V(Y) = .0487 


E(Y) = 31/12, V(Y) = 1.160 

$4.65, .012 

E(Y) = 60, V(Y) = = 

Е(Ү) =4 

а E(Y)=5.5,V(Y) = 

b Using Tchebysheff’s theorem, 
the interval is (5, 6.275). 


с Yes; P(Y) = .5781 
E(Y) =0 
5; 25 
2 
a PY <22) === 4 
1 
b P(Y > 24) = rm = .2 
3 
а Р(Ү> 2) = 
(У > 2) = 1 
4 
b Co + cy [5+ J 
3 
4 
1 
| 
8 
1 
b — 
8 
Л 
4 
2 
3 
b 42.015, V(Y) = .00041 
E(=pD') = 00000657, 
? 
P 2 
v(% D) = 00035257 
а =0 


4.63 
4.65 
4.67 
4.69 
4.71 
4.73 


4.75 
4.77 


4.87 
4.89 
4.91 
4.93 
4.97 


4.99 
4.101 


4.103 


4.105 


4.107 


4.109 
4.111 


4.123 


4.125 


4.129 
4.131 


4.133 


b zo=1.10 

С 20 = 1.645 

d zo = 2.576 

a Р( > 1) = .1587 

b The same answer is obtained. 

$425.60 

ш = 3.000 in. 

.2660 

.9544 

.8297 

.406 

960.5 mm 

= 7.301 

0.758 

22.2 

фоѕ = .70369. 

dos = .35185 

p-.8 

P(Y x 1.7) = .8806 

.1353 

460.52 cfs 

.5057 

1936 

.3679 

a .7358 

a E(Y)=1.92 

b P(Y > 3) = .21036 

d Р(2< Y < 3) = .12943 

E(A) = 200x, У (А) = 200,000? 

а Е(Ү) = 3.2, V(Y) = 6.4 

b P(Y > 4) = .28955 

а (0, 9.657), because У must 
be positive. 

b P(Y < 9.657) = .95338 

E(L) = 276, V(L) = 47,664 


oa c 


Tate oe oe Tat 


d уйг (e+ 3) reni =o 
1 А i Lu (a — 5) 
: #@—1) nas NAI 
Шо > =, 
2 TA 
ifa > 
a k=60 
b фо; = 0.84684 
1 
Е(Ү) = o 5s 
E(C) = —, V(C) = 29.96 
a 75 
b 2357 
a с= 105 
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3 0 0 
bu-- LAN 
P Ay! 3 О<у<.5 
с o=.1614 (4у – 1)/3 .5<у<1 
d .02972 1 yz1 
4.139 туй) = expit(4— 31) + (1/2) 0*0] b (у) =0.25Fi(y) + 0255) 
normal, E(X) = 4 — Зи, V(X) = 90°, с E(Y) = .533, V(Y) = .076 
uniqueness of moment-generating 4.161 $, = 85.36 
и" 4.163 1 – (.927? = .3155 
4141 m) = =a 4.165 a c=4 
4.143 об, —— b Е(Үу=1,У(Ү)= 
2 — 
4.145 a 3 c m(t) т 3027 52 
1 4.167 E(Y*) = Го + 8)Г@ +@) 
(t+ 1) aL ROT Cp 
c 1 4.169 e?52.0 
4.147 s=: 4.171 a EQ) = 5, VW) = 5 
4.149 1 b 1—e^ 


4.151 The value 2000 is only .53 standard 4.473 f(r)-2Azre^7",r > 0 
deviation above the mean. Thus, we 4.175 42-1414 


would expect C to exceed 2000 4.179 k=(.4)'? = .7368 
fairly often. 4.181 m(t) = exp(t?/2); 0; 1 
4.153 (6.38,28.28) 4.183 a E(Y) = 598.74 g 
4.155 $113.33 V(Y) = e? (el6 — 1)107* 
4.157 a F(x)= b (0, 3,570,236.1) 
0, x <0 c .8020 
(1/100)е*/1%0, O<x<200 4.187 a е725 = .082 
1, x > 200 b .0186 
b 8647 4.189 E(Y) = 0. Also, it is clear that 
0 у<0 п—1 
! 4.191 c 1— e% 
"gs 40:9 qus 4.193 150 
1 у> .5 4.195 а 12 
Р(у) = b ж = 120 
Chapter 5 
5.1 a yı 0 < yi, 0 € yo, and yı + y < 3. 
01 2 5.5 a .1065 
oli 2 1 b 5 
aces gm 5.7 a .00426 
»z 1$ 5 0 b .8009 
2110 0 5.9 a k=6 
2 31 
еба 
b FA, 
LR 5.11 а 2 
Ble Y. =) Р 
pg n — , where b 1 


3 


aad 
м4 
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5.13 


5.15 


5.17 
5.19 


5.21 


5.23 


5.25 


5.27 


5.29 


5.31 


£ 


c 


ъ „с^ TF с 


a 


oo +40 0с 


У 0 1 2 


Pi) E à 5 


Мо 
Hypergeometric with N = 9, 
n=3,andr=4. 


з 3 
Воз) = у— qni 0<у<1 


Defined over ух < yı < 11у > 0 
1 


3 

fiO) =e, y > 0; 

hO =e, у > 0 

P(1 < ү < 2.5) = Р(1 < Y, < 

2.5) = ет! — e?? = .2858 

у > 0 

ХО») = ЛО) =e, у> 0 
Оу) = hO) =e, y > 0 
same 

same 

АО) =3d-y),0<y <l; 
a = 6у(1- у), 0 < у <1 


63 
ор) = —, OX y у, 
: y2 
ify <1 

2(1 — у») 
foy) = яси” 

(1— yi) 
yz»zlif»z0 


4 
fh0»22ü-y)0z»zk 
ЛО) = 1- [yı], for 

it <y <l 


3 
ЛО) = 20y: (1 — y1), 0x 
yı <l 


5.33 


5.35 
5.37 


5.41 


5.45 
5.47 
5.51 


5.53 
5.55 
5.57 
5.59 
5.61 


5.63 
5.65 
5.69 


5.71 


5.73 
5.75 


5.77 


b ho: = 
150 + y2)?¥3, —1<y <0 
15(1— y) y}, O<y <1 


с fy) = 3320 — y), 
for y = 1 < y < 1- y 


d 5 
а fig = уе", у = 0; 
РО) =e”, у> 0 


b Рур) 679179, y, > уз 
с 05у) = Шу, OS ур у 
5 


Dependent 

a fOr, у) = fiO) Оу») so that 
Y, and Y; are independent. 

b Yes, the conditional probabilities 
are the same as the marginal 
probabilities. 

No, they are dependent. 

No, they are dependent. 

No, they are dependent. 

No, they are dependent. 

Yes, they are independent. 


4 
Exponential, mean 1 


1 
а /(у.у) = (5) e7012)/3, 


yı>0, у > 0 
b PY +Y < 1)= 


4 
1— 267 = .0446 


£ 
| 


‚0249 

‚0249 

2 

They are equal. 

11 

4° 2 3 
EQ?) = 1/10, VY) = =, 


cC о ооо аъ» = 


E(Y7) 2 VQ) : 
2.) — s у= A 
5 10 20 


Cc —— 
4 


5.79 
5.81 
5.83 
5.85 


5.87 


5.89 


5.91 
5.93 


5.95 


5.97 


5.99 
5.101 
5.103 


5.105 
5.107 


5.109 
5.113 


$9 ——oco 


E(Y,) = E(Y?) = 1 (both 
marginal distributions are 
exponential with mean 1) 
b Vv(y)-vopzi 
с E(Yi - Y) 20 

a 
d E(Y;iY)21- p% 


а 
Cov(Yi, Y) = – = 


4 
ое 
S ( p 2 
а E(Yi- Yi) = и +v 
b V(Y; + Yo) 22v + 2v 
Cov(Y; №) = zt. As the value of Yi 
increases, the value of Y, tends to 
decrease. 
Соу(Ү Y2) =0 
а 0 
b Dependent 
c 0 
d Not necessarily independent 


The marginal distributions for Y; 

and Y> are 

591 —1 0 1 y2 0 1 
Gi 111 ( 2 1 

PA 333 2202) 3 3 

Соу(Ү, Y2) =0 

a 2 


b Impossible 

С 4 (a perfect positive linear 
association) 

d —4 (a perfect negative linear 
association) 

0 

a = 

4 
Е(ЗҮ, + 4Ү, m 6Y3) = —22, 
vor + AY, — 6Y3) = 480 


Qa 


9 

EY, + Ү›) = 2/3 and 
Ү(Ү + №) = 18 
(11.48, 52.68) 


E(G) = 42, V(G) = 25; the value $70 


70 — 42 


1s = 7.2 standard deviations 


above the mean, an unlikely value. 


5.115 


5.117 


5.119 


5.121 


5.123 
5.125 


5.127 


5.133 


5.135 


5.137 
5.139 


5.141 
5.143 


5.145 


5.147 
5.149 


5.157 


5.161 
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b V(Y) = 38.99 
с The interval is 14.7 + 24/38.99 or 
(0, 27.188) 

Pi = P2, 
N-—n 
n(N — 1) 

.0823 


а 
b Ee =. ve 
у= 5 = 7 


[pi + рз — (pi — pi] 


n 
Cov(Y?, Үз) = E 


2n 
EY — Үз) = 0, V Y2 — Y) = з 
.0972 
.2; .072 
.08953 
.046 
.2262 
.2759 
.8031 


c v со m» c» aa 


Tyo uT & 


À 21? 
EY) = =, VV) = 3 


my(t) = (1 — 0), E(U) = 0, 
VU) =1 


a fu) 23y50xy <1 
3 

Jo Soe y3),0<y<1 
23 
44 
с РОУ) = 
d 35 
р(у) = | 
aca i 

y 6+1 6+1)” 
y=0,1,2,... 9 _ 
EČ — X) = m - m, VÝ - X) = 
о2/п + о2/т 


2yi 
— x 2 Sy <1 
(1 — y3) 
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5.163 


6.1 


6.3 


6.5 


6.7 


6.9 


6.11 


6.13 
6.15 
6.17 


6.25 
6.27 


b Е(у, у) = 
yiyall — a(l — yi) (1 — y»] 
с Ру, у) = 
1—о[(1— 231) (1 2у›)], 
0< у < 1,0< у <1 


Chapter 6 


l-u 
-l<u<l 


, -l<u<l 


с Ja —1,0<и<1 
E(U)) = —1/3, E(U2) = 
1/3, E(Us) = 1/6 

e EQY-1) = -1/3, Е(1—2Ү)= 
1/3, E(Y2) = 1/6 


b (ш = 
(u+4)/100, —4<и<6 
1/10, б<и=< 11 
C 5.5833 
1 [u-3\ 
fol) = z( 2 ) , 
5<u>53 i 
= —1/2„—и/2 
а u) = иге 
fuu) FA 
и> 0 


р U has а gamma distribution with 
а = 1/2 and В = 2 (recall that 
Г(1/2) = 7). 

fu) =2u,0<u<1 

E(U) = 2/3 

Е(Ү + Y2) = 2/3 

fuu) = 4ue 7", u > 0,a gamma 
density with a = 2 

and 8 = 1/2 

b E(U)=1,V(U) = 1/2 


шо св 


fu(u) = Буш) = Se"? u > 0 
[-In(1 — U)]'? 
оу! 
а fy)=— Os ys 
b rey 


с у= 4,/и. The values are 2.0785, 
3.229, 1.5036, 1.5610, 2.403. 
fu(u) = 4ue™™ foru > 0 


2. lu | 
a }№(у) = gu ""/P. w > 0, which 
is Weibull density with m — 2. 
k 
b E-r (5 3 ) gia 


5.165 a 


6.29 


6.31 
6.33 


6.35 
6.37 


6.39 
6.43 


6.45 
6.47 
6.51 


6.53 


6.55 
6.65 


d Choose two different values for a 
with —1 <a < 1. 

(pie + pre? + pe^)" 

b m(t, 0,0) 

с Cov(X,, X5) = —npip» 


a fion = 

1/2 ,—w/kT 

r3) any? e w>0 
b E(W)= Se 
2 

fulu) = da и> 0 
fu (и) = 4(80 — 31и + Зи?), 
45 <и < 5 
Р (и) = – (и), 0 <их<1 


а my,(t)=1— p+ pe 

b myw(t) = E(e'") 2 [1.— p+ ре)" 

fuu) = 4иет?“, и > 0 

а Y has anormal distribution 
with mean u and variance o?/n 

b Р(Ў—и|<1) = .7888 

с The probabilities are .8664, .9544, 
.9756. So, as the sample size 
increases, so does the probability 
that P(]Y — u| € 1) 

c — $190.27 

P(U > 16.0128) = .025 

The distribution of Y; + (n; — Y2) is 

binomial with n, + n5 trials and success 

probability p — .2 

а Binomial (nm, p) where 
nj =m 

b Binomial (ni = no +-+-Mn, р) 

с Hypergeometric (r = n, 
N=n, ncn) 

Р(Ү > 20) = .077 

а (и, и) = 
ieee 


2л 
1 


xt 
р E(U)- E(Z) = 0, 
E(U2) = E(Z, + 22) = 0, 
VU) = Ү(41) = 1, 
Ү(5) = V(Z, + 22) = 
V(Z,) + V(Zz) = 2, 
Cov(U;, Ur) = E(Z2) = 1 


(2и2—2и 1 из+из)/2 


6.69 


6.73 


6.75 


6.77 a 


6.81 
6.83 
6.85 
6.87 


7.9 


7.11 
7.13 
7.15 


7.17 


7.19 
7.21 


7.27 


C Not independent since 
p #0. 

d This is the bivariate normal 
distribution with ш = u2 T 0, 
оў = 1,02 = 2, and p = — 

1 2 J2 


а (у. р) = 55.01 > l, 
ys 122 


е Мо 
а golu) =2u,0<u<1 
b E(U2) = 2/3, V (U2) = 1/18 
(10/15)? 
п! 
(j — 1)!(k — 1 — jn — k)! 
y Di — yO — y~ 
Ө" 
O<yj<y<0 
(n —k-- Dj ө? 
(п + 1)?(n+ 2) 
(п = К+) +1006 Ј),, 
(п + 1)2(п + 2) 
b 1-е 
1 (.5)" 
5 
а gn0)2e979,yz4 


b E(Yay) = 5 


, 


Chapter 7 


a .7698 

b Forn = 25, 36, 69, and 64, the 
probabilities are (respectively) 
.8664, .9284, .9642, .9836. 

C The probabilities increase with n. 

d Yes 

.8664 

.9876 

a E(X – Ý) = ш -m 

b V(X – Ў) = ог/т+о2/п 

C The two sample sizes should be at 
least 18. 

P (X14 Z <6) = 5768 

P(S? > .065) = .10 

b = 2.42 

a = .656 

.95 

.17271 

.23041 

.40312 


осо о св 


6.89 


6.93 


6.95 


6.97 


6.101 


6.103 


6.105 


6.107 


6.109 


7.31 


7.35 


7.39 


7.43 
7.45 
7.47 
7.49 
7.51 
7.53 
7.55 
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fr) = n(n — Dr"? г), 


O<r<il 
fo) = 3 (Fe-w)osw sl 
a fu,(u) =} 2 


b fy,(u) =ue",0 <u 
с Same as Ex. 6.35. 
p(W = 0) = p(0) = .0512, 
p(1) = .2048, p(2) = .3264, 
PG) = .2656, p(4) = .1186, 
р(5) = .0294, p(6) = .0038, 
p(T) = .0002 
(и) = 1,0 < u < 1 Therefore, U has 
a uniform distribution on (0, 1) 
1 
л(1+ uy)’ 


wd — и), 0 <и <1 


oO <и < oo 


В(о, В) 
—— 0 <и=<1 
" < 

paag *Y* 
— 1<и<9 
8/u Mos 
PU = C, — Сз) = 4156; 
P(U = С — Сз) = .5844 


5.99, 4.89, 4.02, 3.65, 3.48, 3.32 
13.2767 
13.2767/3.32 ~ 4 
E(F) = 1.029 
V(F) = .076 
3 is 7.15 standard deviations above 
this mean; unlikely value. 
normal, E(6) = Ө = 
с + copa +: + CK MK 

A Л? 2 
vô) = (+++ 

Пі n» Nk 

b x? міфи +n +--- +n; — k df 
C t with zn +n +- +n — k df 
.9544 
.0548 
153 
.0217 
664 
b Y is approximately normal: .0132. 
a random sample; approximately 1. 
b 1271 


© c» ans 


£ 
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7.57 
7.59 
7.61 
7.63 
7.65 


7.67 


7.71 


7.73 
7.75 
7.77 
7.79 


7.81 


8.3 


8.5 
8.7 
8.9 
8.11 
8.13 
8.15 


8.17 


8.19 
8.21 
8.23 


8.25 


8.27 
8.29 


.0062 

.0062 

п = 51 

56 customers 

а Exact: .91854; normal 
approximation: .86396. 

a n=5 (exact: .99968; 
approximate: .95319); п = 10 
(exact: .99363; approximate: 
.97312); n = 15 (exact: .98194; 
approximate: .97613); п = 20 
(exact: .96786; approximate: 
.96886) 

an>9 

b n> 14, п> 14,n > 36,n > 36, 
n> 891,n > 8991 

8980 

.7698 

61 customers 

а Using the normal approximation: 
.7486. 

b Using the exact binomial 
probability: .729. 

a .5948 


Chapter 8 


a B(0)—a0-X-b—0 = (a—1)0 +b 


b Letó* = (6 —b)/a 
a MSE(0*) = V(6") = V6) /a? 


05 —C 
a> 

й o? +02 — 2c 
Ү—1 

0, — 90; + 54 


b vals = DIY/w[1 — (Y/n)] 


Зп – 1 P 


"uu 2 А 
МЕ (Зп — Dn — 5^ 


b 
а (1—-2p)/( + 2) 
b 
c 


пр(1— р) + (1 —2py 
(n + 2)? 
р will be close to .5. 
MSE(0) = 8? 
11.5 + .99 
11.3 + 1.54 
13+ 1.7 
17 + .08 
—.7 
404 
.601 + .031 
—.06 + .045 


шо ona an 


7.83 


7.85 
7.87 
7.89 
7.91 


7.95 


7.97 
7.101 
7.103 
7.105 


8.31 
8.33 
8.35 


8.37 
8.39 
8.41 


8.43 
8.45 


8.47 
8.49 
8.57 
8.59 
8.61 
8.63 
8.65 
8.67 


8.69 
8.71 


b With p = .2 and 3, the 
probabilities are .0559 and .0017 
respectively. 

a .36897 

b .48679 

8414 

.0041 

ш = 10.15 

Since X, Y, and W are normally 

distributed, so are X, Y, and W. 


шо = EU) = Au t-2u»t-4us 
2 


o2 = V(U) = 116 (2) 


nı 
о? о? 
+.04 (2) + .16 (2) 

n»? пз 
а F with num. df = 1, denom. df = 9 
b F with num. df = 9, denom. df = 1 
€ с= 49.04 
b .1587 
.8413 


.1587 
.264 


а —.03 + .041 
7 + .205 
а 20+ 1.265 
b —3+ 1.855, yes 
1020 + 645.1 
2Y 2Y 
Ce ‚71072. 
(Y?/5.02389, Y?/.0009821) 
Y?/.0039321 
Y?/3.84146 
[Yo)1(95)-/" 
Y/.05132 
80% 
(2.557, 11.864) 
(3.108, 6.785) 
.51 + .04 
а .78 + .021 
(15.46, 36.94) 
а .78 + .026 or (.754, .806) 
а .06+ .117 or (—.057, .177) 
а 7.2 + 751 
b 2.5 + .738 
.22 + .34 or (—.12, .56) 
п = 100 


бобосо со сь 


8.73 
8.75 
8.77 
8.79 


8.81 
8.83 


8.85 
8.87 
8.91 
8.93 


8.95 


8.99 а 


8.101 


9.1 


9.3 b 


9.5 
9.7 
9.9 
9.23 
9.25 


9.31 
9.35 


9.47 


9.57 
9.59 


9.61 


9.63 


9.69 


n = 2847 
n = 136 

n = 497 

a n = 2998 

b n= 1618 
60.8 + 5.701 

a 3.4+37 

b 743.32 

—1 +4.72 
(—.624, .122) 
(—84.39, —28.93) 


ee 4 3 
а 2X¥+Y+1.960,/-+ — 
n т 
e te 4 3 
b 2X + Y + t,5S,| — + —, where 
n m 


gà - YY + 1/3 (xX; - Xy 


n+m—2 
(.227, 2.196) 


(п — 89? 
Res 
= 1)5? 
b (n 3 
Xa 
s? = .0286; (.013 .125) 


Chapter 9 


1/3; 2/3; 3/5 

12n? 
(п + 2)(п + 1)? 

п—1 

1/п 

a Х=1 

с need Var(Xz; — Хз) < oo 

b .6826 

c No 

aß 


а ў, is unbiased for u. 


= 1 п 2. 
b V(¥,) = n 2x4 о 


Son); no 
i=l 


Yes 


2Y -1 
—, no, not MVUE 
1-Ү 


8.103 
8.105 
8.107 
8.109 


8.111 
8.113 
8.115 
8.117 
8.119 
8.121 
8.123 


8.125 


8.129 а 


8.131 


8.133 Ь 


9.71 
9.75 


9.77 
9.81 
9.83 


9.85 


9.87 


9.91 
9.93 


9.97 


889 
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(1.407, 31.264); no 

1 — 2(.0207) = .9586 
765 seeds 

a .0625 + .0237 

b 563 

n = 38,416 

n = 768 

(29.30, 391.15) 

11.3 + 1.44 


S S 
b (кк 52 nen 
1 ^ v2,vi,o/2 1 
v—n;—li-l2 
2(n — 1)o* 
n2 
An 1 
п+1 
20* 


п + п – 2 


1 
SPERM Ж уа 5) 2 

g 7 1 п 2 
With т, = — У^ 12, the MOM 


. mE 1— 2m, 
estimator of Ө is Ө = ————. 
4m, — 1 


Mia 

(225 Y; 2) ~) 
31.4104 ' 10.8508 

Pa = ‚30, Рв = .38 

Pc = .32; —.08 + .1641 

Yny/2 

a Yu 

c [(@/2)!/?" Ya), (1— (a/2)) ?" Ya] 

a 1/Y 

b 1/ř 
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9.99 


9.101 


9.103 


10.3 


10.5 
10.7 


10.17 


10.21 
10.23 


10.25 
10.27 
10.29 
10.33 
10.35 
10.37 
10.39 
10.41 
10.43 


10.45 
10.47 
10.49 
10.51 


10.53 


10.55 


| BU B) 
pia JP Ez P 
n =: == 
Е Y exp(—2Y 
exp(-P) E zap j РС?) 


1 n 
22 


Chapter 10 


а с=11 
b .596 
с .057 
c — 1.684 
False 
False 
True 
True 
False 
i True 
ii True 
iil False 
а Ho: ш = и, Ha: ш > ио 
С == .075 
z = 3.65, reject Ho 
a-b Ho: ш — ро = 0 vs. 
Ha: ш — u2 Æ 0, which 
is a two-tailed test. 
c z = —.954, which does 
not lead to a rejection 


о aA oD 


with о = .10. 
|z| = 1.105, do not reject 
z = —.1202, do not reject 
z = 4.47 
z = 1.50, no 


z = — 1.48 (1 = homeless), no 
approx. 0 (.0000317) 

.6700 

025 

а 49 

b .1056 

.22 + .155 or (.065, .375) 

.5148 

129.146, yes 

z — 1.58 p-value — .1142, do not 
reject 

a z= —.996, p-value = .0618 
b No 

С z = —1.826, p-value = .0336 
d Yes 

z = —1.538; p-value = .0616; fail to 
reject Ho witha = .01 


9.105 


9.107 
9.109 


9.111 


10.57 
10.63 
10.65 


10.67 


10.69 


10.71 


10.73 


10.75 


10.77 


10.79 


p Е — ду 
.n 

exp(—t/Y) 
a ÁN, z2Y-1 

№ —1 
b 

3n 

252 + 85.193 
z = — 1.732; p-value = .0836 
a t= —1.341, fail to reject Ho 
а t= —3.24, p-value < .005, yes 
b Using the Applet, .00241 
c 39.556 + 3.55 
a t= 4.568 and to; = 2.821 so 

reject Ho. 


b The 99% lower confidence bound 
54 
is 358 — 2.821 —— = 309.83. 
М10 


a t= —1.57,.10 < p-value <.20, 
do not reject; using applet, 
p-value = .13008 


i —ty = —1.319 and 
—105 = —1.714; 
.10 < p-value < .20. 
ii Using the Applet, 
2P(T < —1.57) = 
2(.06504) = .13008. 


а y; = 97.856, s? = .3403, 
ӯ = 98.489, 52 = .3011, 
t = —2.3724, —to, = —2.583, 
—to25 = —2.12, so .02 < p-value 
< .05 

b Using Applet, .03054 

а t= 1.92, do not reject 
.05 < p-value < .10; applet 
p-value — .07084 

р г = .365, do not reject p-value 
> .20; applet p-value = .71936 

t = —.647, do not reject 

a t= —5.54, reject, p-value < .01; 
applet p—value approx. 0 

b Yes 

С t= 1.56, .10 < p-value < .20; 

applet p-value = .12999 

Yes 

х? = 12.6, do not reject 

.05 < p-value < .10 

Applet p-value = .08248 


© с з» oc 


10.83 


10.85 


10.89 


10.91 


10.93 
10.95 


10.97 
10.99 


10.101 


10.103 


10.107 


а of £03 
b of <a 
с of >of 
x? = 22.45, p-value < .005; applet 
p-value = .0001 
S 
45 
25 
1 
Reject if Y >= 7.82. 
.2611, .6406, .9131, .9909 
= 16 
U = m У Y; has Xo 
distribution under Ho: reject Ho 
ifU > x2 
Yes 
Yes, is UMP 


Drzi 

i=l 

Use Poisson table to find k such 
that Р(У Y; > k) = а 

Yes 


b» Y; «c 

i=l 

Yes 

Reject Ho if Yu) < 694/o 

Yes 

oc (n — 182 ш = 1S} a 
% 

xê +m-2) distribution under Ho; 

reject if x? > x2 


ys THM a0 TD 


Qa c o» cc 


c» c 9 


Chapter 11 


$215-—.6x 
$ = 21.575 + 4.842x 
а The relationship appears to be 
proportional to x?. 
No 
No, it is the best linear model. 
ў = —15.45 + 65.17x 
108.373 
1 = 2.514 
The least squares line is 
$ = 452.119 — 29.402x 
а SSE = 18.286; 
S? — 18.286/6 — 3.048 
b The fitted line is 
$ = 43.35 + 2.42x*. The same 


f£ ví. cn c 


10.109 


10.115 


10.117 


10.119 


10.121 
10.123 


10.125 


10.127 
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(X)" (Y) 


mX 4 nY Zn 

E y 
b XJ/Y distributed as F with 2m and 

2n degrees of freedom 

True 

False 

False 

True 

False 

False 

False 


a 
b 
с 
d 
e 
f 
5 
h False 
i 
a 
b 
с 
d 
a 
b 
@ 


а = 


True 
t = —22.17, p-value < .01 
—.0105 + .001 
Yes 
No 
Ho: р = .20, Ha: p > .20 
a = .0749 
= 5.24, p-value approx. 0 
a Е = 2.904, no 
b (.050, .254) 
а t= —2.657, .02 < p-value < .05 
b —4.542 + 3.046 
Т = EN Е" к 
(Х+Ү —/)—(и1—и2—из) 


{Сш )[5@%-®°+! xai 0*4 уот] 


10.129 


11.19 


11.23 


11.25 


with (Зи — 3) degrees of freedom 


у = (== i = ху x 
n6,.o 
apl -> yay) +n). 


- Qi — 
91.0 

answer for SSE (and thus 5?) is 

found. 

The least squares line is: 

$ = 3.00 + 4.75x 

52 = 5.025 

t = —5.20, reject Ho 

.01 « p-value « .02 

.01382 

(—.967, —.233) 

t = 3.791, p-value < .01 

Applet p-value = .0053 

Reject 

475 + .289 


£ 


aa со о.о ср 
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11.29 


11.31 


11.33 
11.35 
11.37 
11.39 
11.41 
11.43 
11.45 
11.47 
11.51 
11.53 


11.57 
11.59 


11.61 
11.63 


11.67 
11.69 


11.73 


12.1 
12.3 
12.5 


12.11 


T = oH m , where $ = о 
5 (52+) 

(SSEy + SSEw)/(n + m — 4). 11.79 
Hp is rejected in favor of H, forlarge 11.83 
values of |T |. 
t = 73.04, p-value approx. 0, Но is 11.85 
rejected 
t = 9.62, yes 11.87 
Xt. 
(4.67, 9.63) 
25.395 + 2.875 11.89 
b (72.39, 75.77) 
(59.73, 70.57) 
(—.86, 15.16) 11.91 
(.27, .51) 11.93 
t = 9.608, p-value < .01 11.95 
а г? = .682 
b .682 
C t= 4.146, reject 11.97 
d Applet p-value — .00161 
а signforr 
b randn 
r = —.3783 
.979 + .104 11.99 
a f, = —.0095, By = 3.603 and 

à, = —(—.0095) = .0095, 11.101 


бо = ехр(3.603) = 36.70. 
Therefore, the prediction equation 
is ў = 36.70e 0, 
b The 90% CI for ao is 
(e35883, 2991] = (36.17, 37.23) 
ў =2.1-— .6x 
a ў = 32.725 + 1.812x 
b $ = 35.5625 + 1.8119х — .1351х2 11.107 
t = 1.31, do not reject 


Chapter 12 
пу = 34, m = 56 12.15 
п = 246, пу = 93, n; = 154 12.17 
With n = 6, three rats should receive 12.31 
x = 2 units and three rats should 
receive x = 5 units. 
а This occurs when р > 0. 
b This occurs when p = 0. 12.35 
с This occurs when р < 0. 
d Paired better when о > 0, 
independent better when p < 0 
12.37 


21.9375 + 3.01 

Following Ex. 11.76, the 95% 
PI = 39.9812 + 213.807 
21.9375 + 6.17 


soe сә 


с 
Е 


Е = 21.677, те]есї 

SSEr = 1908.08 

F = 40.603, p-value « .005 
950.1676 

F = 4.5, F, = 9.24, fail to 
reject Ho 

F = 2.353, Е = 2.23, reject Ho 
True 

False 

False 


= 10.21 


90.38 + 8.42 


ооо ъс are 


If 


ў = —13.54 — 0.053x 
t = —6.86 

929 + .33 

$ = 1.48254 .5x, +.1190х› — .5x3 
ӯ = 2.0715 

t = —13.7, reject 
(1.88, 2.26) 
(1.73, 2.41) 


—9 <x < 9, choose n/2 at x = —9 


and n/2 at x — 9. 


a 
b 
d 


со = O 


M = 9.34+ 2.46х\ + .6x2 + 41х12 
9.34 , 11.80 

For bacteria A, $ — 9.34. For 
bacteria B, 5 = 11.80. The 
observed growths were 9.1 and 
12.2, respectively. 

12.81 + .37 

12.81 + .78 

г = .89 

t = 4.78, p-value <.01, reject 


t = 2.65, reject 

Ki 

Mi 1 

His n + o°] 

ш — ua, 20? [n, normal 

t = —4.326, .01 < p-value 
< .025 

—1.58 + 1.014 

65 pairs 


13.1 


13.7 


13.9 


Chapter 13 


с an 


Е = 2.93, do not reject 
.109 


|| = 1.71, do not reject, F = 1? 


Е = 5.2002, reject 
p-value = .01068 


SSE = .020; F = 2.0, do not 
reject 


13.11 SST = .7588; SSE = .7462; 
Е = 19.83, p-value < .005, reject 
13.13 SST = 36.286; SSE = 76.6996; 
F = 38.316, p-value < .005, reject 
13.15 F = 63.66, yes, p-value < .005 
13.21 a —12.08+ 10.96 
b Longer 
C Fewer degrees of freedom 
13.23 a 1.568 +.164 or (1.404, 1.732); yes 
b (—.579, —.117); yes 
13.25 .28+.102 
13.27 а 95% Cl for u4: 76 + 8.142 
or (67.868, 84.142) 
b 95% CI for ив: 66.33 + 10.51 or 
(55.82, 76.84) 
С 95% СІ for ua — Ив: 
9.667 + 13.295 
13.29 a 6.244.318 
b —.29+.241 
13.31 a F=1.32,no 
b (—21,421) 
13.33 (1.39, 1.93) 
13.35 a 2.743.750 
b 27.5 + 2.652 
13.37 а u 
р Overall mean 
13.39 b (207)/b 
13.41 a Е =3.11, do not reject 
b p-value > .10 
С p-value = .1381 
d 52 = 2MSE 
13.45 a Е = 10.05; reject 
b Е = 10.88; reject 
13.47 
Source df SS MS F 
Treatments | 3 8.1875 2.729 1.40 
Blocks 3 7.1875 2.396 1.23 
Error 9 17.5625 1.95139 
Total 15 32.9375 


F — 1.40, do not reject 


13.49 
13.53 
13.55 
13.57 
13.59 
13.61 
13.63 
13.69 


13.71 
13.73 


13.75 
13.77 


13.79 


13.81 


13.83 
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Answers 


F — 6.36; reject 

The 95% CI is 2 + 2.83. 

The 95% CI is .145 + .179. 
The 99% CI is —4.8 + 5.259. 


na > 3 
b = 16; n = 48 
Sample sizes differ. 


a В, + b; 15 the mean response to 
treatment A in block III. 

b дз is the difference in mean 
responses to chemicals A and D in 
block III. 

Е = 7; Но is rejected 

As homogeneous as possible within 

blocks. 

b Е = 1.05; do not reject 

а А 95% CI is .084 + .06 or 
(.024, .144). 

a 16 

b 135 degrees of freedom left for 
error. 

c 14.14 

Е = 7.33; yes; blocking induces loss in 

degrees of freedom for estimating o?; 

could result in sight loss of information 

if block to block variation is small 

a 


Source 
Treatments 
Blocks 
Error 


df SS MS F 
2 524.177.167 262,088.58 258.237 
3 173,415 57,805.00 56.95 
6 6,089.5 1,014.9167 


Total 


11 703,681.667 


13.87 


b 6 
С Yes, F = 258.19, p-value < .005 
d Yes, F = 56.95, p-value « .005 
e 22.527 
f —237.25 + 55.13 
а SST = 1.212, df = 4 
SSE = .571, df = 22 
F = 11.68; p-value < .005 
b |t| = 2.73; Ho is rejected; 2(.005) 
< p-value < 2(.01). 
Each interval should have confidence 
coefficient 1 — .05/4 = .9875 ~ .99; 
Ma — ир: :320 + 251 
Ив — ир: 145+ 251 
ис = ир: .023 + .251 
HE — Hp: —.124+.251 
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13.89 b o; с 02 + ко 
С oj = 0 d o2 
13.91 a ш;оў+]о? 
b ojt GDE 
Chapter 14 
14.1 a X? = 3.696, do not reject 14.19 a X? = 22.8705, reject 
b Applet p-value — .29622 р p-value < .005 
14.3 X? = 24.48, p-value < .005 14.21 a X? = 13.99, reject 
14.5 a z=1.50, do not reject b X? = 13.99, reject 
b Hypothesis suggested by observed C X? = 1.36, do not reject 
data 14.25 b X? = 19.1723, p-value = 
14.7 102 + .043 0.003882, reject 
14.9 a 394.149 с —.11+.135 
b 374.187, .39 + .182,.48+.153 14.27 X? = 38.43, yes 
14.11 X? = 69.42, reject 14.29 a X? = 14.19, reject 
14.13 a X? = 18.711, reject 14.31 X? —21.51, reject 
b p-value « .005 14.33 X? = 6.18, reject; .025 < p-value 
C Applet p-value = .00090 < .05 
14.15 b X? also multiplied by k 14.35 a Yes 
14.17 a X? = 19.0434 with a p-value of b p-value = .002263 
.004091. 14.37 X? = 8.56, df = 3; reject 
b x? 60.139 with a p-value of 14.41 X? = 3.26, do not reject 
approximately 0. 14.43 X? — 74.85, reject 
с Some expected counts < 5 
Chapter 15 
15.1 15.23 U — 9, do not reject 
— - 15.25 z = —1.80, reject 
Rejection region а 


15.27 U =O, p-value = .0096 
15.29 H = 16.974, p-value < .001 
15.31 а SST = 2586.1333; SSE = 
11,702.9; F = 1.33, do not 
15.3 а m — 2, yes reject 1 
b Variances not equal b H —122,do no reject 
15.5 P(M <2or M > 8) = 1l, no 15.33 Н = 2.03, do not reject 


15.7 a P(M < 2or M > 7) = 18, do 15.37 a No, p-value = .6685 
b Do not reject Ho 


15.39 Е, = 6.35, reject 
15.41 a Е, = 65.675, p-value < .005, 


М <богМ > 19 P(M x 6-4 P(M > 19) = .014 
M <7огМ > 18 P(M <7)+P(M > 18) = .044 
М <80rM>17 P(M <8)+P(M > 17) = .108 


not reject 
b t= —1.65, do not reject 
15.9 a p-value = .011, do not reject 


15.11 T=min(Tt,T-),T=T-. reject 
15.13 a T —6,.02 < p-value < .05 b m —0, P(M = 0) = 1/256, 

b T =6,0.1 < p-value < .025 p-value = 1/128 — 
15.15 T —3.5,.025 < p-value < .05 15.45 The null distribution is given by 
15.17 T =11, reject P(F, = 0) = P(F, = 4) = 1/6 and 
15.21 a U = 4; p-value = .0364 P(F, = 1) = P(F, = 3) = 1/3. 

р U —35; p-value = .0559 15.47 R=6,no 


с U = 1; p-value = .0476 


15.49 


15.51 
15.53 
15.55 


15.57 
15.59 


16.1 


16.3 


16.7 


16.9 


16.11 


a .0256 

b An usually small number of runs 
(judged аго = .05) would imply a 
clustering of defective items in 
time; do not reject. 

R — 13, do not reject 

rs = .911818; yes. 

а rs = —.8449887 

b Reject 

rs = .6768, use two-tailed test, reject 

rs = 0; p-value < .005 


Chapter 16 


В(10, 30) 

n= 25 

В(10, 30), n = 25 

Yes 

Posterior for the В(1, 3) prior. 
Means get closer to .4, std dev 
decreases. 

e Looks more and more like normal 


боо Ссс c» 


distribution. 
Y+1 
a 
n+4 
b np--l np(l-— р) 
п+4” (n+4)? 
b atl | 
o4 B-4cY' 


(е ++ D(8-Y - 1) 
(a -- B -- Y - D(a-* B- Y) 


y nf 1 
(ркт) +99 (ян) 


15.61 


15.63 


15.65 
15.67 
15.69 
15.71 
15.73 


16.13 


16.15 
16.17 
16.19 
16.21 


16.23 


16.25 
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Answers 


a Randomized block design 

b No 

c p-value = .04076, yes 

T = 73.5, do not reject, consistent with 
Ex. 15.62 

U = 17.5, fail to reject Ho 

.0159 

Н = 7.154, reject 

F, — 6.21, do not reject 

.10 


a (.099, .710) 

b Both probabilities are .025 

c Р(.099 <р < .710) = .95 

h Shorter for larger n. 

(.06064, .32665) 

(.38475, .66183) 

(5.95889, 8.01066) 

Posterior probabilities of null and 
alternative are .9526 and .0474, 
respectively, accept Ho. 

Posterior probabilities of null and 
alternative are .1275 and .8725, 
respectively, accept H4. 

Posterior probabilities of null and 
alternative are .9700 and .0300, 
respectively, accept Ho. 


INDEX 


A 
Acceptance region, 511 
Addition of matrices, 821—822 
Additive law of probability, 
58, 699 
for conditional probabilities, 61 
effect of mutually exclusive 
events on, 63 
Allometric equations, 606 
Alternative hypothesis, 489-490 
choice of, 500, 519 
lower-tail, 499 
simple, 542, 555 
small-sample test, 521 
two-tailed, 499, 500 
upper-tail, 497 
Analysis 
categorical data, 713—740 
analysis of variance (ANOVA), 
661—712 
assumptions for, 670 
F test and, 665, 666, 670 
introductory overview of, 
661—662 
linear models for, 701—705 
one-way layouts and, 667—679 
procedure for, 662—667 
randomized block design and, 
688—695 
selecting the sample size for, 
696-698 
sums of squares, 679-680 
Analysis of variance table, 671 
for one-way layouts, 671—677 
for randomized block design, 
689, 690 
ANOVA or AOV. See Analysis of 
variance 
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Applets 


Bayes’ Rule as a Tree, 72—73 

Beta Probabilities and Quantiles, 
195, 198, 199, 200, 217, 
811-812, 815 

Binomial Revision, 805—806, 811 

Chi-Square Probabilities and 
Quantiles, 357, 365, 366, 
533, 718, 719, 724, 738, 
768, 773 

Comparison of Beta Density 
Functions, 194, 197 

Comparison of Gamma Density 
Functions, 186, 189, 190, 
366 

ConfidencelntervalP, 415, 
416-417 

DiceSample, 348, 349, 350 

Fitting a Line Using Least 
Squares, 512, 574, 602 

F-Ratio Probabilities and 
Quantiles, 363, 367, 535, 
537, 540, 627, 630, 666, 
667, 671, 673, 674, 
691, 692, 704 

Gamma Probabilities and 
Quantiles, 186, 190, 192, 
210, 217, 218, 411, 811, 
812-813, 815 

Hypothesis Testing (for 
Proportions), 501—503, 520 

Normal Approximation to 
Binomial Distribution, 382, 
383, 385 

Normal Probabilities, 181, 182, 
183, 515 

Normal Tail Areas and Quantiles, 


179, 183, 184 


PointbyPoint, 455 
Point Estimation, 455 
PointSingle, 454—455 
Removing Points from 
Regression, 639 
Sample Size, 352, 373—374 
Sampling Distribution of the 
Mean, 351 
Sampling Distribution of the 
Variance, 352 
Student's t Probabilities and 
Quantiles, 361, 366, 522, 
525, 526, 586, 601, 605, 
619, 647, 700 
VarianceSize, 353 
Arithmetic mean. See Mean 
Association between populations, 
784—785 
Asymptotic normal distribution, 372 
Attained significance levels, 
513-518, 745-746 


B 
Balanced layout, 670 
Bayes, Thomas, 817 
Bayes estimator, 800—805 
Bayesian methods, 796-819 
credible intervals, 808-813 
priors, posteriors, and estimators, 
797-808, 816 
tests of hypotheses, 813—815 
Bayes’ rule, 71—72 
Bayes’ Rule as a Tree applet, 72—73 
Bell-shaped distribution. See 
Normal distribution 
Bernoulli probability function, 798 
Bernoulli random variable, 166, 
322, 462, 466 


Beta density function, 194—196 
Beta distribution, 194—201 
Bayesian priors and posteriors, 
799-800, 801, 816 
incomplete beta function, 194 
mean, 195, 837 
moment-generating function, 837 
probability function, 837 
related to binomial distribution, 
195 
of the second kind, 343 
variance, 195, 837 
Beta function 
incomplete, 194 
related to gamma function, 835 
Beta prior distribution, 816 
Beta Probabilities and Quantiles 
applet, 195, 198, 199, 200, 
217, 811-812, 815 
Biased estimators, 392, 393 
Bayes estimators as, 803, 818 
sampling distribution for, 393 
Bias of point estimators, 392 
Binomial coefficients, 46 
Binomial distribution, 100—114 
central limit theorem and, 
378-385 
cumulative form for, 194 
formula for, 103 
histograms, 104 
hypergeometric distribution and, 
128 
mean, 106—108, 836 
moment-generating function, 836 
negative, 121-125 
normal approximation to, 
378-385 
tables for, 838—840 
variance, 106—108, 836 
Binomial expansion, 46, 
104, 835 
Binomial experiments, 101—102, 
103, 280 
Binomial Revision applet, 805—806, 
811 
Binomial probability function 
related to incomplete beta 
function, 194 
tables, 194—195 
Bivariate density function, 228, 229, 
284 


Bivariate distributions, 224—235 
transformation method and, 314 
Bivariate normal distribution, 
283-285 
testing for independence in, 
598—599 
Bivariate probability function, 
224—225 
Bivariate transformation method, 
325-333 
Block designs 
Latin square, 655 
randomized block, 654—655 
Block effects, 686 
Bonferroni inequality, 62, 699 


C 
Categorical data 
analysis of, 713—740 
definition of, 713 
chi-square test and, 734—735, 736 
experiments with, 713—714 
methods for analyzing, 734—735 
Cell frequencies, estimating 
expected, 717, 723—724, 735 
Cell probabilities, testing 
hypotheses concerning, 
716—721, 735 
Central limit theorem, 201, 
370—385 
binomial distributions and, 
378—385 
formal statement of, 372 
moment-generating functions 
and, 377-378 
proof of, 377—378 
uses for, 370, 378 
Central moment, 138, 202 
Central tendency, measures of, 9 
Chi-square distributions 
degrees of freedom for, 322, 434, 
716 
density function for, 434 
hypothesis tests and, 715—716 
inferential procedures and, 357 
mean and variance for, 837 
moment-generating function, 
321-322, 837 
density function, 837 
table of percentage points of, 
849—850 
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Chi-Square Probabilities and 
Quantiles applet, 357, 365, 
366, 533, 718, 719, 724, 
738, 768, 773 
Chi-square random variable, 
187-188 
Chi-square test, 714—716 
categorical data analysis and, 
734-735, 736 
goodness-of-fit and, 717—718, 
735 
test statistic for, 715 
for population variance, 532—533 
CM (correction for the mean), 668 
Coefficient of determination, 601 
multiple, 627 
Coefficient of variation, 387 
Coefficients 
binomial, 46 
confidence, 406—407, 437 
multinomial, 45 
Combinations, 46 
Combinatorial analysis, 38-39 
counting rules in, 40—51 
results from, 41, 44 
Comparison of Beta Density 
Functions applet, 194, 197 
Comparison of Gamma Density 
Functions applet, 186, 189, 
190, 366 
Complement, 24 
probability of, 58—59, 66 
of rejection region, 511 
Complementary events, 66 
Completely randomized design, 
652, 654 
difference from randomized 
block design, 654, 686 
experimental error, 654 
Complete model, 624, 626-628 
Completeness, 472 
Composite hypothesis, 542 
Compound events, 27, 28—29 
Conditional density function, 
240-241 
Conditional discrete probability 
function, 239 
Conditional distribution, 
238-242 
Conditional distribution function, 
240 
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Conditional expectations, 285—290 
Conditional mean, 287 
Conditional probability, 47, 51—57 
binomial experiments and, 102 
unconditional probability vs., 
51-52 
Conditional probability distribution, 
238—242 
continuous, 240—241 
discrete, 238—239 
Confidence bands, 596, 597 
Confidence bound, 412, 426, 434, 
512 
Confidence coefficient, 406—407, 
437, 699 
simultaneous, 699—700 
Confidence intervals, 406—437 
Bayesian credible intervals and, 
808-809 
compared to prediction interval, 
596 
difference between means and, 
427-429, 681—682, 695 
for E(Y), 591, 596-597 
hypothesis testing and, 511-513 
large-sample, 411-421, 483-484 
for least-squares estimator, 586 
matched-pair experiments and, 
647 
for mean, 425-434, 681—682 
multiple linear regression and, 
618 
null hypothesis and, 511 
one-sided, 407, 426 
one-way layouts and, 681—683 
overview of, 406—407 
for p of binomial distribution, 411 
for (pi — p»). 411 
for parameter Ві, 585 
pivotal method for, 407—409 
for population mean, 411, 425, 
427, 430 
for population variance, 434—435 
randomized block design and, 
695 
relationship with hypothesis 
testing, 511 
relationship with f test, 525 
sample size and, 421—425 
simple linear regression and, 586, 
590, 591, 596-597 


simultaneous, 698—701 
small-sample, 425—434 
sufficient statistics and, 468 
treatment means and, 681—682 
two-sided, 407, 426, 511—512 
unbiased, 443 
upper limits of, 406, 412, 426, 
434 
width of, 640 
ConfidencelntervalP applet, 415, 
416-417 
Confidence level, 422 
Confidence limits, 406, 408—409, 
412, 413, 414, 426 
Conjugate priors, 800, 816 
Consistency, 448—459 
Consistent estimator, 449, 450 
Contingency tables, 721—734 
degrees of freedom for, 723—724 
fixed row or column totals in, 
729—734 
independent classifications and, 
722 
maximum-likelihood estimators 
and, 722-723 
Continuity correction, 382 
Continuous distribution, 158-169 
Continuous random variables, 
157-222 
beta distribution, 194—201 
conditional distribution, 
240-242 
definition of, 160 
density function of, 161—165 
distribution function of, 160—165 
expected values of, 170-174, 
202-207, 256 
gamma distribution, 185—194 
independence of, 248 
jointly continuous, 226—228 
kth moment abut the origin, 202 
marginal density functions, 236 
median of the distribution of, 176 
moment-generating functions of, 
202-207 
normal distribution, 178-184 
Tchebysheff's theorem and, 
207-210 
uniform distribution, 174—178 
Controlled independent variables, 
661 


Convergence, 448—449, 451, 453, 
457 
Correction for the mean (CM), 668 
Correlation, 598—604 
Correlation coefficient 
covariance and, 265 
interpreting values of, 601 
Kendall’s rank, 783 
sample notation for, 599 
Spearman’s rank, 783—789 
Counting rules, 40-51 
Covariance, 264—270 
computational formula for, 266 
correlation coefficient and, 
265-266 
definition of, 265 
independent variables and, 267 
least-squares estimators, 578-579 
linear functions and, 271—276 
multinomial experiments and, 
281-282 
zero, 267—268, 284 
Cramer-Rao inequality, 448 
Credible intervals, 808—813 
Critical values 
of F statistic, 690—691 
of Spearman's rank correlation 
coefficient, 871 
of T in Wilcoxon signed-ranks 
test, 867—868 
Cumulative distribution function, 
158 
Curvature, detecting, 643 


D 


Decomposition of events, 70 
Degrees of freedom 
for chi-square distribution, 322, 
434, 716 
for contingency tables, 723—724 
for F distribution, 362, 626, 665 
for sum of squares, 688 
for t distribution, 360, 426, 430, 
584 
DeMorgan's laws, 25 
Density functions 
beta, 194—196 
bivariate, 228, 229, 284 
chi-square, 434 
conditional, 240—241 
definition of, 161 


Density functions (Continued) 
distribution function and, 298, 
301, 304 
exponential, 188, 371 
F distribution, 362 
gamma, 185—187 
increasing, 311—312 
joint, 227, 230, 325 
kth-order statistic, 336 
log-normal, 218 
marginal, 236, 335 
minimum/maximum, 333 
model selection, 201 
multivariate normal, 283-284 
normal, 178-179 
order statistics and, 333-338 
parameters of, 175 
posterior, 797—798, 800, 801, 817 
properties of, 162 
Rayleigh, 318 
t distribution, 360, 426 
uniform, 175 
Weibull, 219, 317, 339 
Dependence, measures of, 264 
Dependence between two 
classification criteria, 721 
Dependent events, 53 
Dependent random variables, 247, 
564 
Dependent variables, 4, 247, 564 
Design of experiments. See 
Experimental design 
accuracy-increasing, 641—644 
block, 654—656 
completely randomized, 652, 654 
Latin square, 655 
matched pairs, 644—651 
optimal, 643 
randomized block design, 
654—655, 686—696 
sample size and, 421—422, 
696—698 
Determination, coefficient of, 601 
Deterministic models, 564—565, 566 
Deviations 
sum of squares of, 569, 662 
total sum of squares of, 662—663 
See also Standard deviation 
Diagrams, Venn, 23—25 
DiceSample applet, 348, 349 
Difference between means 


ANOVA procedure and, 667—671 
confidence intervals and, 
427-429, 681—682, 695 
experimental design and, 
641-642 
matched-pairs experiment and, 
645—646 
one-way layouts and, 681—682 
randomized block design and, 
695 
small-sample tests for, 523—525 
Discontinuous functions, 210 
Discrete distribution, 87-91, 514 
Discrete random variables, 86-156 
binomial distribution, 100-114 
conditional distribution, 238-239 
distribution function for, 1569 
definition of, 87 
expectation theorems, 94—96 
expected values of, 91—100, 256 
geometric distribution, 114—121 
hypergeometric distribution, 
125-130 
independence of, 247, 248 
mean of, 95, 150 
moment-generating functions, 
138-143 
negative binomial distribution, 
121-125 
Poisson distribution, 131-138 
probability distributions for, 
87—91 
probability-generating functions 
for, 143-146 
Tchebysheff’s theorem and, 
146-149 
variance of, 95—96, 150 
Discrete sample space, 28 
Disjoint sets, 24 
Dispersion, measures of, 9 
Distribution functions 
conditional, 240 
continuous random variable, 
160-165 
cumulative, 158 
density function and, 298, 301, 
304 
discrete random variable, 
158-160 
of gamma-distributed random 
variable, 185 
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joint, 226—227 

method of, 298-310 

multivariate, 232 

order statistics and, 333 

properties of, 160 

random variable, 158—165 

of t, 453 

U test statistic, 861—866 

Distribution functions method, 

298—310 

summary of, 304 

transformation method and, 
310-311 

Distributions, 4 

bell-shaped or normal, 5 

beta, 194—201 

binomial, 100-114 

bivariate, 224—235 

bivariate normal, 283-285 

chi-square, 187-188 

conditional, 238—242 

continuous, 158—169 

discrete, 87—91 

exponential, 837 

F,362 

of functions of random variables, 
297 

gamma, 185-194 

geometric, 114—121 

hypergeometric, 125-130 

joint, 224 

log-gamma, 344 

log-normal, 218, 344 

marginal, 235—238, 288—289 

Maxwell, 220 

mixed, 211-212 

multinomial, 279—283 

multivariate normal, 
283-285 

negative binomial, 121-125 

normal, 178—184 

Pareto, 310 

Poisson, 131-138 

relative frequency, 4, 5 

sampling, 346—389 

standard normal, 318 

skewed, 185 

Student's t, 359—361 

Tchebysheff's theorem and, 
146-149 

uniform, 174—178 


900 Index 


unique characterization of, 138 
Weibull, 202, 219 

Distributive laws, 25 

Dummy variable, 701 


E 
е“ table, 841 
E(Y), 91 
Effect of treatment, 678 
Efficiency, relative, 445—448 
Elementary experimental designs, 
651-656 
Empirical rule, 10, 11 
Empty set, 23 
Error of estimation, 297, 399—400 
good approximate bound on, 401 
probabilistic bound on, 400 
sample size and, 421—422 
Errors 
experimental, 654 
mean square, 393 
prediction, 594—595, 622-623 
random, 568, 584, 633 
standard, 397, 399, 645 
type I, 491, 493-494 
type II, 491, 493-494, 507-510, 
541 
See also Sum of squares for error 
Estimated expected cell frequencies, 
723 
Estimation, 390-443 
error of, 297, 399-400, 422 
goodness of, 556 
inferences and, 556 
least squares method of, 
564-639 
maximum-likelihood, 476—483 
method of moments, 472-476 
minimum-variance unbiased, 
464—472 
one-way layouts and, 681—685 
randomized block design and, 
695—696 
Estimators 
Bayes, 800-805 
biased, 392, 393, 803, 818 
for comparing two population 
means, 451 
confidence intervals, 406—411 
consistency of, 448—459 
definition of, 391 


efficient, 448 
goodness of, 392, 399-406 
human, 391 
interval, 406 
large-sample interval, 411—421 
least-squares, 571, 577—583, 633 
maximum-likelihood, 477—485 
mean square error of, 393 
method-of-moments, 472—475, 
603 
minimum-variance unbiased, 
465—472 
point, 392-399, 444—464 
pooled, 428, 664, 681 
of population variance, 357 
relative efficiency of, 445-448 
sampling distribution of, 444 
sequence of, 454 
unbiased, 392, 393, 396—399, 
445, 577 
See also Point estimators 
Even functions, 221 
Event-composition method, 35, 
62—69 
examples of using, 62—63, 64—68 
steps in process of, 64 
Events, 27 
complementary, 66 
compound, 27, 28-29 
decomposition of, 70 
dependent, 53 
discrete sample space, 29 
independent, 53 
intersection of two or more, 
223-235 
intersection of n, 231 
mutually exclusive, 58, 59 
numerical, 75 
random, 20 
simple, 27 
stochastic, 20 
symmetric difference between, 74 
Expectations 
conditional, 285—290 
discontinuous functions and, 
210-213 
mixed distributions and, 210—213 
Expected cell frequencies, 723 
Expected values 
conditional, 285—290 
of a constant, 258 


of a constant times function, 95 
continuous random variables and, 
170-174, 202-207, 256 
definition of, 91 
discrete random variables and, 
91—100, 256 
of discontinuous functions, 
210211 
for hypergeometric random 
variable, 275 
independent random variables 
and, 259—260 
least-squares estimators and, 
577—581 
linear functions and, 270-279 
MST for one-way layout and, 
679—681 
for mixed distributions, 211—213 
multinomial experiments and, 
281-282 
multivariate distributions and, 
255-261 
point estimators and, 397, 399 
for Poisson random variable, 
134-135 
posterior, 800 
runs test and, 782 
special theorems for computing, 
258-261 
standard deviation as, 93 
of sum of functions, 94—95 
theorems for multivariate random 
variables, 258—259 
theorems for univariate random 
variables, 95—96 
U test statistic and, 761—762 
variance as, 94, 171 
Experimental design, 78, 421, 
640—660 
accuracy in, 641-644 
completely randomized, 652, 654 
elementary designs in, 651-656 
Latin square, 655, 662 
matched-pairs, 644—651 
randomized block, 654—655, 
686—696, 703 
sample size and, 696-698 
Experimental units, 652 
Experiments, 26—35 
binomial, 101—102, 280 
categorical data, 713—714 


Experiments (Continued ) 
definition of, 27 
design of, 78, 421, 640-660 
errors associated with, 654 
factors and levels of, 652, 661 
independent samples, 645 
matched-pairs, 641, 644-651, 
744-750 
multinomial, 279—280, 713-714 
paired-difference, 648 
probabilistic model for, 26—35 
random sampling in, 77—79 
single-factor, 652 
Exponential density function, 188, 
371 
Exponential distribution, 186, 
188-189, 201, 306-307 
mean and variance of, 837 
memoryless property of, 189 


moment-generating function of, 837 


F 
F (test) 
analysis of variance and, 668 
hypothesis testing concerning 
variances, 533—540 
test statistic, 535 
F, (test statistic), 772 
F(y) and f(y), 158, 160, 161, 162 
Factor, 652, 656, 661 
Factorial moment, 144 
Factorization criterion, 461, 468, 
470 
F distribution, 362—363, 536, 537, 
625, 628 


degrees of freedom for, 362, 626, 


665 
table of percentage points of, 
851-860 
Fit, lack of, 634 
Fitted models, 628-630 


Fitting a Line Using Least Squares 


applet, 572, 574, 602 
Fixed block effects model, 686 
Fixed row and column totals, 
729—731 


F-Ratio Probabilities and Quantiles 


applet, 363, 367, 535, 537, 
540, 627, 630, 666, 667, 


671, 673, 674, 691, 692, 704 


Frequency distributions, 9-11 


Frequency histograms. See 
Histograms 
Frequentist approach, 814, 818, 819 
Friedman, Milton, 771 
Friedman test, 771—777 
sign test and, 773 
summary of, 772 
F test 
ANOVA and, 665, 666, 670 
variance and, 536—537 
Functions 
of continuous random variables, 
expected value of, 170-171 
density. See Density function 
discontinuous, 210 
distribution, 158, 298—310 
expected value of, 171 
finding distribution of, 297—298 
gamma, 185 
increasing, 311 
likelihood, 542, 553 
linear, 270—279, 589—593, 
598—604, 616—622 
methods for finding probability 
distribution of, 297—325 
mixed distribution, 211—212 
moment-generating, 138—143, 
202-206 
of normally distributed random 
variables, 321—322 
probability See Probability 
function 
probability-generating, 143-146 
of random variable, expected 
value of, 92-100, 204 
of random variable, finding the 
moments of, 205 
random variable, 296—345 
reliability, 343 
step, 159 
See also Density functions; 
Distribution functions; 
Probability functions 
Functions of random variables, 
296—345 
distribution functions and, 
298-310 
finding the distribution of, 
297—298 
moment-generating functions 
and, 298, 318-325 
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multivariable transformations 
and, 325-333 

order statistics and, 333—340 

transformation method and, 298, 
310—318 


G 


Gamma density function, 185-187 
beta function related to, 835 
Gamma distribution, 185-194 
chi-square random variable, 
187-188 
exponential density function, 188 
log-gamma distribution, 344 
mean, 186, 837 
moment-generating function, 837 
parameters associated with, 
185-186 
probability function, 837 
variance, 186, 837 
Gamma Probabilities and Quantiles 
applet, 186, 190, 192, 210, 
217, 218, 411, 811, 
812-813, 815 
Gamma random variable, 187—188 
chi-square, 187—188 
moment-generating function, 203 
Geometric distribution, 114—121 
mean, 116-117, 836 
moment-generating function, 836 
probability function, 836 
variance, 116—117, 836 
Geometric random variable, 
116-117 
mean and variance of, 150 
probability-generating function 
for, 145 
Geometric representations 
joint density function, 229, 230, 
231 
marginal density function, 238 
Geometric series, formula for sum 
of, 67, 835 
Goodness 
of estimation procedure, 556 
of point estimators, 392, 
399—406 
of statistical tests, 540, 556 
Goodness-of-fit test, 717—718, 735 
Graphical descriptive methods, 
3-8 
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H 
H (test statistic), 766, 767, 768 
Hierarchical models, 288—289 
High-influence points, 634, 639 
Histograms, 4—6 
area under, 5—6 
binomial distribution, 104 
bivariate distribution function, 
159 
construction of, 4—5 
density functions and, 201 
exponential distribution, 371 
geometric distribution, 115 
probability, 89, 94, 104, 115 
probabilistic interpretation of, 
5-6 
relative frequency, 4, 371 
three dimensional, 225 
Homogeneity test, 731 
Hooke's law, 587 
Hypergeometric distribution, 
125-130 
mean, 127, 836 
moment-generating function, 836 
probability function, 836 
variance, 127, 836 
Hypergeometric random variable, 
126, 150 
Hypothesis 
alternative, 489—490, 496, 519 
composite, 542 
null, 489—490, 496, 519 
research, 489—490 
simple, 541—542 
Hypothesis testing, 488—562 
attained significance levels in, 
513-518 
chi-square distribution and, 
715—716 
commentary on the theory of, 
518-520 
confidence intervals related to, 
511-513 
elements of statistical tests and, 
489-495 
errors in, 491, 493-494, 507-510 
introductory overview of, 
488—489 
large samples used in, 496—507 


likelihood ratio tests for, 549-556 


mean and, 520—530 


multiple linear regression and, 
618 

Neyman-Pearson Lemma and, 
542—546 

null hypothesis and, 489—490, 
624—633 

power of, 540—549 

reporting results of a test, 
513-518 

simple linear regression and, 590 

small samples used in, 520—530 

type П error probabilities in, 
507—510 

variances and, 530—540 


Hypothesis Testing (for 


Proportions) applet, 
501-503, 520 


Hypothesis tests 


acceptance region of, 511 

attained significance levels, 
513-518 

Bayesian, 813-815 

for Bi, 565 

for categorical data, 713—740 

for cell probabilities, 716—721, 
735 

chi-square, 714—716, 717, 
734—135 

choice of appropriate, 500 

elements of, 489—495 

errors in, 491, 493-494, 507—510 

F test, 530—533, 665 

Friedman, 771—777 

goodness-of-fit, 717—718, 735 

Kruskal-Wallis, 765—771 

large-sample, 496—507 

least-squares estimator, 585 

level of, 491 

likelihood ratio, 549—556 

Mann-Whitney U, 758—765 

most powerful, 542—543 

Neyman-Pearson lemma for, 
540—549 

nonparametric, 741—795 

one-tailed, 499 

power of, 540-549 

p-values in, 513-518 

rank-sum, 755—757, 758, 762 

rejection region of, 490—491, 499, 
500 

sign, 744—750 


small-sample, 520—530 

Spearman rank correlation, 
783-789 

two-tailed, 499 

uniformly most powerful, 
544-546 

Wilcoxon signed-rank, 750—755 

Z-test, 507-510 


І 
Identity elements, 824—826 
Identity matrix, 825 
Incomplete beta function, 194 
Increasing functions, 311—312 
Independence, 247—250 
definition of, 247 
establishing, 247—248 
testing for, 598-599 
Independent events, 53 
Independent random samples, 653 
Mann-Whitney U test for, 756, 
758-765 
rank-sum test for, 755—757 
Independent random variables, 
247-255 
continuous variables as, 248 
covariance of, 267 
definition of, 247 
moment-generating functions of, 
320 
Independent samples experiment, 
645 
Independent variables, 564 
controlled, 661 
regression of, 566 
rescaling, 628 
sum of squares for, 356 
Indicator variable, 701 
Inequality 
Bonferroni, 62, 699 
Cramer-Rao, 448 
Markov, 221 
Inference, 2 
Inference making, 2, 13-14 
Bayesian approach to, 796-819 
estimation and, 556 
hypothesis testing and, 556 
least-squares estimators and, 
584—589 
multiple linear regression, 
616—622 


Inference making (Continued) 
probability and, 21—23 


simple linear regression, 589-593 


statistics and, 347 
Integer-valued random variables, 
143-144 
Integration 
limits of, 250 
region of, 231, 302 
Intersection 
of events, 57, 223-224 
probability of, 57 
of sets, 24 
Interval estimate, 391 
Interval estimators, 406 
Intervals 
Bayesian credible, 808—813 
prediction, 595—597, 608, 623 
See also Confidence intervals 
Invariance property, 480 
Inverse of distribution function, 
306—307 
Inverse of a matrix, 826 
Inverting a matrix, 829—833 


J 
Jacobians, 325—333 
Joint density function, 227—228 
expected values and, 260—261 
geometric representations of, 
229, 230, 231 
order statistics and, 334, 
336, 337 


transformation method and, 314, 


325-330 


Joint distribution function, 226-227 
for continuous random variables, 
227-228 
for discrete random variables, 227 
order statistics and, 334 
Jointly continuous random 
variables, 227, 228 
Joint probability function, 225-232 
Joint probability mass function, 225 


K 
Kendall’s rank correlation 
coefficient, 783 
Kruskal-Wallis test, 765—771 
rank-sum test and, 768 
summary of procedure, 767 


kth factorial moment, 144 

kth moment of a random variable, 
138, 202, 472 

kth-order statistic, 336 


L 
Lack of fit, 634 
Large numbers, law of, 451 
Large samples 
confidence intervals and, 
411-421, 483-484 
Friedman test for, 772 
hypothesis tests and, 496—507 
Kruskal-Wallis test for, 
766—767 
likelihood ratio tests and, 553 
Mann-Whitney U test for, 
761-762 
maximum-likelihood estimators 
and, 483—485 


sign test for comparing, 746—747 


Wilcoxon signed-rank test for, 
752—153 
Latin square design, 655, 662 
Law of large numbers, 451 
Law of total probability, 
70—75 
Laws of probability, 57—62 
additive law, 58 
multiplicative law, 57 
law of total probability, 70—75 
Laws of sets 
DeMorgan's, 25 
Distributive, 25 
Layout, one-way, 653, 662 
Least-squares equations, 570, 
610-611 
general linear model and, 611 
solving using matrix inversion, 
833 
Least-squares estimators 
confidence interval for, 586 
covariance for, 578—579 
expected value for, 577—581 
hypothesis test for, 585 


inferences concerning parameters 


of, 584—589, 616-622 
multiple linear regression and, 
615-616 
notation used for, 577 
properties of, 577—583, 616 
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simple linear regression and, 571, 
571—583, 610 
unbiased, 577 
variance for, 577—581 
Least-squares method. See Method 
of least squares 
Level of a factor, 652, 661 
Level of the test, 491 
Likelihood estimation. See Method 
of maximum likelihood 
Likelihood function, 460, 461, 467, 
471, 549 
Likelihood of the sample, 460—461 
Likelihood ratio tests, 549—556 
description of, 549—550 
large-sample, 553 
power of the test and, 553 
rejection region of, 550, 552 
Linear correlation, simple 
coefficient of, 264 
Linear dependence, 265 
Linear equations 
matrix expression for system of 
simultaneous, 827-829 
solving a system of simultaneous, 
833-834 
Linear functions 
correlation and, 598—604 
covariance and, 271—276 
expected values and, 270—279 
least squares estimators as, 582 
inferences concerning, 589—593, 
616—622 
of model parameters, 589—593, 
616-622 


of random variables, 270—279 
variance and, 270-279 


Linear models 


analysis of variance using, 
701-705 

fitting using matrices, 609-615, 
628—629 

least-squares equations and, 611 

randomized block design and, 
703 

slope of the line in, 642—643 

solutions for general linear 
model, 611 

using for analysis of variance, 
710—705 


Linear regression models, 566—567 
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multiple, 566—567, 569, 609, 
615-622 
simple, 566, 569, 589—597 
Linear statistical models, 566—569 
analysis of variance and, 701—705 
correlation and, 598—604 
definition of, 568 
estimating parameters of, 569 
inferences about parameters in, 
584—593, 616—622 
least-squares procedure and, 
569—576 
matrices used with, 609—615 
multiple linear regression, 
615-624 
predicting values using, 593—597, 
622—624 
simple linear regression, 
577-583, 589-597 
test for null hypothesis, 624—633 
Location model, 743 
Logarithmic series distribution, 739 
Log-gamma distribution, 344 
Log-normal distribution, 218, 344 
Lower confidence bound, 412, 426, 
512 
Lower confidence limit, 406 
Lower-tail alternative, 499 
Lower-tail rejection region, 499 


M 
M (test statistic), 744 
Main diagonal, 825 
Mann-Whitney U test, 756, 
758-765 
efficiency of, 762 
formula for, 758 
large samples and, 761—762 
rank-sum test and, 758, 762 
runs test and, 781 
summaries of, 759—760, 762 
usefulness of, 762 
Marginal density function, 
236-238 
Marginal distribution, 235—238, 
288-289, 816 
continuous, 236 
discrete, 236 
Marginal probability function, 236 
Markov inequality, 221 
Matched-pairs experiment, 641 


experimental design of, 644—651 
sign test for, 744—750 
usefulness of, 648 
Wilcoxon signed-rank test for, 
750-755 
Mathematical models. See Models 
Matrices, 820-834 
addition of, 821—822 
algebra dealing with, 821, 823 
definition of, 820 
dimensions of, 820-821 
elements of, 820-821 
expression for a system of 
simultaneous linear 
equations, 827—829 
fitting linear models using, 
609—615, 628—629 
identity elements of, 824—826 
inverse of, 826 
inverting, 829—833 
main diagonal of, 825 
multiplication of, 822-824 
real number multiplication of, 
822 
solving system of simultaneous 
linear equations using, 
833—834 
square, 825 
transpose of, 827 
Matrix algebra, 821, 823 
identity elements in, 825 
Maximum-likelihood estimators 
(MLEs), 477-485 
chi-square computations 
and, 716 
contingency tables and, 722—723 
invariance property of, 480 
large-sample properties of, 
483-485 
See also Method of maximum 
likelihood 
Maximum of random variables, 333 
Maxwell distribution, 220 
Mean 
beta distribution, 195, 837 
binomial distribution, 106—108, 
836 
chi-square distribution, 837 
comparison of, 427-428, 
667-671 
conditional, 286 


confidence intervals for, 
425-434, 681-682 
correction for, 668 
difference in, 409, 427—430, 451, 
522-524, 641-642, 646-647 
discrete random variable, 95, 150 
estimating, 296—297 
exponential distribution, 837 
F distribution, 362 
formula for, 9 
gamma distribution, 186, 837 
geometric distribution, 116—117, 
836 
hypergeometric distribution, 127, 
836 
hypothesis tests for, 520-530 
kth moment about, 202 
of least-squares estimators, 
581-582 
mixed distributions, 213 
negative binomial distribution, 
122-123, 836 
normal distribution, 353—354, 
837 
overall, 678 
Poisson distribution, 134—135, 
141, 836 
sampling distribution, 347, 351 
small-sample test for, 521—522 
uniform distribution, 176, 837 
See also Difference between 
means 
Mean square error of point 
estimators, 393 
Mean square for blocks 
(MSB), 689 
Mean square for error (MSE), 665, 
681, 689, 690 
Mean square for treatments (MST), 
665, 679—681, 690 
Mean squares, 665, 688 
Measures of central tendency, 9 
Measures of dispersion, 9 
Measures of variation, 9 
Median 
point estimation, 445 
random variable, 164, 747 
Memoryless property, 189 
of exponential distribution, 189 
of geometric distribution, 119 
Mendel, Gregor, 55 


Method of distribution functions, 
298-310 
summary of, 304 
transformation method and, 
310—311 
Method of least squares, 564, 
569-576, 633 
fitting a straight line by, 642—643 
Method of maximum likelihood, 
476-483 
examples of using, 110, 118, 
477-480 
formal statement of, 477 
Method of moment-generating 
functions, 298, 318—325 
summary of, 322 
uses for, 320, 321 
Method of moments, 472-476 
formal statement of, 473 
uses for, 472, 475 
Method-of-moments estimators, 
472-475, 603 
Method of transformations, 298, 
310-318 
distribution function method and, 
310-311 
multivariable, 325—333 
summary of, 316 
Minimal sufficient statistics, 465, 
471 
Minimum of random variables, 333 
Minimum-variance unbiased 
estimation, 464—472 
Minimum-variance unbiased 
estimators (MVUEs), 
465-472, 476 
and the method of maximum 
likelihood, 476-477 
unique, 472 
Mixed distribution, 211—212 
MLEs. See Maximum-likelihood 
estimators 
mn rule, 41—43 
Modal category, 7 
Model parameters 
multiple linear regression, 
616-624 
simple linear regression, 589-593 
Models, 14 
allometric equation, 606 
block effects, 686 


complete, 624, 626-628 

deterministic, 564—565, 566 

fitted, 628—630 

fixed block effects, 686 

hierarchical, 288—289 

linear, 566—569 

linearized, 606 

location, 743 

mathematical, 14 

multiple linear regression, 
566—567, 569, 609, 615-624 

no-intercept, 575 

nonlinear, 608 

one-way layout, 677-679 

planar, 629, 630 

probabilistic, 26-35, 565 

quadratic, 614 

random block effects, 686 

for randomized block design, 
686—687 

reduced, 624, 626—628, 629 

regression, 566—567, 634 

second-order, 628-630 

selection of, 201 

simple linear regression, 566 

two-sample shift, 742—743 

See also Statistical models 

Moment-generating function 

method, 298, 318—325 

summary of, 322 

uses for, 320, 321 

Moment-generating functions, 

138-143 

applications for, 139—140, 141 

beta distribution, 837 

binomial distribution, 836 

central limit theorem and, 
377—378 

chi-square distribution, 837 

continuous random variable, 
202-207 

definitions of, 139, 202 

discrete random variable, 
138-143 

exponential distribution, 837 

extracting moments from, 204 

for a function of a random 
variable, 205 

gamma distribution, 837 

geometric distribution, 836 

hypergeometric distribution, 836 
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kth derivative of, 139, 202 
method of, 318—325 
negative binomial distribution, 
836 
normal distribution, 837 
Poisson distribution, 140, 836 
probability distributions and, 141 
probability-generating function 
and, 144 
random variable, 139, 202 
uniform distribution, 837 
Moments, 138-143 
central, 138, 202 
for continuous random variables, 
202 
factorial, 144 
method of, 472-476 
population, 472—473 
of random variables, 138—139, 
472-473 
sample, 472-473 
taken about mean, 138 
taken about origin, 138 
Most powerful test, 542—543 
MSB. See Mean square for blocks 
MSE. See Mean square for error 
MST. See Mean square for 
treatments 
Multicollinearity, 634 
Multinomial coefficients, 45 
Multinomial distributions, 279—283, 
735 
Multinomial experiments, 279—282, 
713-714 
Multinomial term, 45 
Multiple coefficient of 
determination, 627 
Multiple linear regression model, 
566—567, 569, 609, 615—624 
confidence intervals for, 618 
hypothesis tests for, 618 
inferences about linear functions 
in, 616-622 
least-squares estimators and, 
615-616 
matrices and, 609 
predicting values using, 622-624 
Multiplication 
matrix, 822-824 
row-column, 822-824 
of matrix by real number, 822 
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Multiplicative law of probability, 
57, 238—239 
for independent events, 63 
Multivariable transformation 
method, 325—333 
Multivariate density function, 231 
Multivariate distributions, 223—295 
bivariate distributions and, 
224—235, 283-285 
conditional distributions and, 
238—242 
conditional expectations and, 
285—290 
covariance of two variables and, 
264—270 
expected values and, 255-261, 
285—290 
independent random variables 
and, 247-255 
marginal distributions and, 
235—238 
multinomial distributions and, 
279-283 
normal distributions and, 
283-285 
transformation method and, 314 
Multivariate normal density 
function, 283—284 
Multivariate normal distribution, 
283-285 
Multivariate probability function, 
232 
Mutually exclusive events, 58, 59 
and the additive law of 
probability, 63—64 
Mutually exclusive sets, 24 
MVUEs. See Minimum-variance 
unbiased estimators 


N 
Negative binomial distribution, 
121-125 
mean, 122-123, 836 
moment-generating function, 836 
probability function, 836 
variance, 122—123, 836 
Negative binomial random variable, 
122, 150 
Neyman-Pearson Lemma, 542-546 
theorem for, 542 
usefulness of, 546 


No-intercept model, 575 
Nonparametric statistics, 741—795 
definition of, 742 
Friedman test, 771—777 
Kruskal-Wallis test, 765—771 
Mann-Whitney U test, 756, 
758—765 
rank-sum test, 755—757 
runs test, 777—783 
sign test for a matched-pairs 
experiment, 744—750 
sources of further information on, 
790-791 
Spearman rank correlation 
coefficient, 783—789 
two-sample shift model, 
742-743 
uses for, 741—742, 789-790 
Wilcoxon rank-sum test, 755 
Wilcoxon signed-rank test, 
750-755 
Nonrandomness test, 780—781 
Normal approximation to binomial 
distribution, 378—385 
continuity correction associated 
with, 382 
when to use, 380 
Normal Approximation to Binomial 
Distribution applet, 382, 
383, 385 
Normal curve, 10—11 
area under, 380—382, 735, 847 
illustrated example of, 11 
table of areas, 522, 847 
Normal density function, 178-179 
Normal distribution, 10, 178—184 
asymptotic, 372 
bivariate, 283—285 
hypothesis testing and, 520—521 
linear functions of, 590 
log-normal distribution, 218, 344 
mean, 353-354, 837 
moment-generating function, 
179, 321—322, 837 
multivariate, 283—285 
point estimation and, 453-454 
probability function, 837 
sampling distributions and, 
353-369 
tables for, 847 
variance, 353—354, 837 


Normal prior distribution, 816 
Normal Probabilities applet, 181, 
182, 183, 515 
Normal random variable, 181 
Normal Tail Areas and Quantiles 
applet, 179, 183, 184 
Nuisance parameters, 546, 549 
Null hypothesis, 489—490 
choice of, 500, 519 
composite, 545—546 
confidence interval and, 511 
power of the test and, 540—541 
p-value and, 513 
simple, 542, 555 
testing, 624—633, 669 
Null set, 23 
Numerical descriptive methods, 
8-13 
Numerical events, 75 


О 
Observed cell frequencies, 723 
Observed life, total, 340 
One-sided confidence interval, 407, 
426 
One-tailed tests, 499, 509, 751 
One-way layouts, 653, 662 
additivity of sums of squares for, 
679-68 1 
analysis of variance for, 667—671 
ANOVA table for, 671—677 
balanced, 670 
estimation in, 681—685 
expected value of MST for, 
679—681 
Kruskal-Wallis test for, 765-771 
sample size selection for, 
696—698 
statistical model for, 677—679 
Operating characteristic curve, 151 
Order statistics, 333—340 
Outliers, 634 
Overall mean, 678 


Р 
р(у), 88, 91, 102 
Paired data, 644—651 
Paired-difference experiment, 648 
Parameters, 91, 390 
of bivariate normal density 
function, 284 


Parameters (Continued ) 
definition of, 93 
of density function, 175 
estimated, 443, 569 
gamma distribution, 185 
inferences concerning model, 
589-593, 616-622 
least-square estimator, 584—589 
nuisance, 546, 549 
shape and scale, 185 
Parametric methods, 742, 789 
Pareto distributions, 310 
Partitions 
of objects into groups, 44 
of sample space, 70, 71 
of total sum of squares, 
662, 688 
Pearson, Karl, 714, 715 
Pearson's test statistic, 714, 715 
Percentile, 164 
Permutation, 43 
Pivotal method, 407—409 
Pivotal quantity, 441 
Planar model, 629, 630 
Plots of residuals, 634 
PointbyPoint applet, 455 
Point estimate, 391 
Point estimation, 392 
maximum-likelihood, 476—483 
method of moments, 472-476 
minimum-variance unbiased, 
465-472 
Point Estimation applet, 455 
Point estimators 
biased, 392, 393 
consistency of, 448—459 
expected values of, 397, 399 
goodness of, 392, 399-406 
mean square error of, 393 
method-of-moments, 472—475 
minimal sufficient, 467 
properties of, 445—464 
relative efficiency of, 445—448 
standard errors of, 397, 399 
sample sizes for, 397 
sufficiency of, 459—464 
unbiased, 392, 393, 396—399, 445 
See also Estimators 
PointSingle applet, 454—455 
Poisson distribution, 131—138 
mean, 134—135, 141, 836 


moment-generating function, 
140, 836 
partial sums for, 134 
probability function, 836 
relationship with gamma 
distribution, 185 
tables for, 842-846 
uses for, 132 
variance, 134—135, 141, 836 
Poisson process, 135 
Poisson random variable, 132 
mean and variance for, 150 
moment-generating function for, 
140 
Pooled estimator, 428, 664, 681 
Population 
definition of, 2 
random sample of, 77—79 
sign test comparison of, 747 
Population distributions 
differing in location, 743 
ranks used for comparing, 
755—157 
testing for identical, 742—743 
Population mean 
large sample confidence interval 
for, 411-412 
maximum likelihood estimator 
for, 478—479 
minimum-variance unbiased 
estimator for, 467—468 
notation for, 9 
overall, 678 
relationship to expected value, 91 
small-sample confidence interval 
for, 427-427 
small-sample hypothesis testing 
for, 520—522, 525—530 
small-sample tests for comparing, 
523 
Population mean comparisons, 
425-434 
analysis of variance, 663 
estimating differences between 
means, 641—642 
more than two means, 667—671 
summary of small-sample 
hypothesis tests for, 523 
Population moments, 472-473 
Population standard deviation, 10 
Population variance 
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confidence intervals for, 434—437 
consistent estimator for, 452 
maximum likelihood estimator 
for, 478-479 
MVUE for, 467—468 
notation for, 10 
pooled estimator for, 428, 523 
tests of hypotheses concerning, 
530-540 
Positive predictive value, 73 
Posterior density function, 797—798, 
800, 801, 817 
Posterior distribution, 798—805, 
816-819 
Posterior expected value, 800 
Power curve, 541 
Power distributions, 309—310, 458, 
463 
Power series, 204 
Power of the test, 540—549 
definition of, 540 
likelihood ratio tests and, 553 
most powerful test, 542—543 
type II errors and, 541 
uniformly most powerful test, 
544—546 
Practical significance, 519 
Prediction bands, 597 
Prediction intervals, 595—597, 608, 
623 
multiple linear regression, 623 
simple linear regression, 595—596 
Predictions 
errors made in, 594—595, 
622-623 
multiple linear regression and, 
622-624 
simple linear regression and, 
593-597 
Predictive distribution, 816—817 
Prior density, 816, 817 
Prior distribution, 796, 797—805 
Probabilistic models, 26—35, 565, 
566 
Probability, 20-85 
additive law of, 58, 699 
axioms of, 30 
Bayes’ rule, 71 
calculation methods for, 25—29, 
62-68 
conditional, 47, 51—57, 102 
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convergence in, 457 
counting rules for, 40—51 
definition of, 30 
event-composition method for, 
35, 62—69 
histogram, 89, 92, 104 
independence and, 53 
inference making and, 14, 
21—23 
of intersection of events, 57 
laws of, 58—59, 70 
law of total, 70-75 
marginal, 236 
multiplicative law of, 57 
numerical events and, 75 
Poisson, 131—138 
random variables and, 75—77 
relative frequency concept of, 21, 
29-30 
sample-point method for, 35—40 
sources of further information 
on, 80 
summary review of, 79-80 
supplementary exercises on, 
80-85 
type I error, 491, 493 
type П error, 491, 493, 507—510 
unconditional, 51, 52, 102 
of union of events, 58-59 
Probability density function, 258, 
407 
Probability distributions. 
See Distributions 
Probability functions, 88 
beta, 835, 837 
binomial, 836 
bivariate, 224—225 
chi-square, 837 
conditional discrete, 239 
exponential, 837 
gamma, 835, 837 
geometric, 836 
hypergeometric, 836 
joint, 225 
logarithmic series, 739 
marginal, 236 
negative binomial, 836 
normal, 837 
Poisson, 836 
unconditional, 288 
uniform, 837 


Probability-generating functions, 
143-146 
definition of, 144 
geometric random variable, 145 
moment-generating functions 
and, 144 
Probability mass functions, 149 
Properties 
invariance, 480 
memoryless, 189 
p-values, 513-518 
computing, 515 
uses for, 513, 514 


Q 


Quadratic model, 614 
Qualitative variables, 662, 713 
Quantity, pivotal, 441 
Quantiles, 164 

Queuing theory, 143 


R 
r (test statistic), 599 
r, (test statistic), 784, 786 
Random assignment, 651—652 
Random block effects model, 686 
Random errors, 568, 584, 633 
Random events, 20 
Randomization, importance of, 657 
Randomized block design, 654—655 
analysis of variance for, 688—695 
estimation in, 695—696 
Friedman test for, 771—777 
linear model approach to, 703 
sample size for, 696 
statistical model for, 686—687 
Randomized design, 652 
Randomness test, 777—783 
Random number generator, 307 
Random number table, 872-875 
Random sample, 78 
independent, 653, 755—765 
simple, 78 
size of, 421—424 
as sufficient statistic, 461 
Random sampling, 77—79 
Random variables, 75—77 
Bernoulli, 166, 322, 462, 466 
beta, 194—196 
binomial, 107-108 
chi-square, 187—188 


conditional density of, 240—241 

conditional discrete probability 
functions of, 239 

continuous, 157-158 

covariance of, 264—270 

density function of, 161—165 

dependent, 247, 564 

discrete, 86—87 

distribution function of, 158—165 

expected values of, 91—100, 
170-174, 202-207, 255-258 

exponential, 188 

factorial moments for, 144 

functions of, 296-345 

gamma, 187-188 

geometric, 116-117 

hypergeometric, 126 

independent, 247—255, 564 

integer-valued, 143-144 

jointly continuous, 227, 228 

jointly discrete, 227 

kth factorial moment of, 138 

kth moment of, 138, 202, 472 

linear functions of, 270-279 

means for, 150 

measures of dependence, 264 

median of, 164, 747 

minimum/maximum of, 333 

mixed distribution, 211—212 

moment-generating functions of, 
138-141 

moments of, 138 

negative binomial, 122 

normal, 181 

ordered, 333 

Poisson, 132 

predicting values of, using 
multiple regression, 
622—624 

predicting values of, using simple 
linear regression, 593—598 

probability density function for, 
161-165, 171-172 

probability-generating functions 
for, 143-146 

standard deviation of, 93 

standard normal, 181 

t-distributed, compared with 
normal, 359—360 

testing for independence of, 
598-604 


Random variables (Continued ) 
uncorrelated, 265, 267 
uniform, 174—176 
univariate, 226 
variance, 93 
vector, 598 
Weibull, 219 
Range, 12 
Rank, 755—757 
Rank correlation coefficient, 
783—789 
Rank sums, 755—757 
Rank-sum test, 755—757, 758 
Kruskal-Wallis test and, 768 
Mann-Whitney U test and, 758, 
762 
Rao-Blackwell theorem, 464—472. 
Rayleigh density, 318, 458 
r х с contingency tables, 721—734 
degrees of freedom for, 724 
fixed row or column totals in, 
729-734 
Reduced model, 624, 626—628, 629 
compared with complete model, 
627-630 
Regression 
multiple linear, 566—567, 569, 
609, 615-624 
simple linear, 577—583, 589-597 
Regression models, 566—567, 634 
lack of fit, 634 
multiple linear, 566—567, 569, 
609, 615-622 
simple linear, 566, 569, 589-597 
Regularity conditions, 553 
Rejection region (RR), 490-491 
complement of, 511 
form of, 543 
F test, 536 
graph of, 534 
likelihood ratio test, 550, 552 
lower-tail, 499 
one-tailed, 751 
runs test, 778, 781, 782 
two-tailed, 499, 500, 584, 751 
upper-tail, 497 
Relative efficiency, 445—448 
Relative frequency distribution, 4, 5 
Relative frequency histogram, 4, 
371 
Reliability functions, 343 


Removing Points from Regression 


applet, 639 


Rescaling independent variables, 


628 


Research hypothesis, 489-490 


See also Alternative hypothesis 


Residuals, 634 

Response variable, 566 

Robust statistical tests, 525 
Row-column multiplication, 822-824 
Row operations, 829-832 

Row probabilities, 722, 724—725, 


729-731 


Runs, 778, 869 
Runs test, 777—783 


expected value of, 782 
Mann-Whitney U test and, 781 
rejection region for, 778, 

781, 782 
table of runs, 869-870 
time series and, 780-781 
variance of, 782 


S 


Sample 


definition of, 2 

elements affecting information in, 
640—641 

independent, 645, 653 

likelihood of, 460—461 

paired, 644—651 

random, 78 

size of, 421—425 


Sample correlation coefficient, 


598—599 
nonparametric analogue to, 783 


Sample mean, formula and notation, 


9 


Sample median, 445 
Sample moments, 472—473 
Sample point, 27 


as an ordered pair, 41 
equiprobable, 38, 120 
representations of, 43 
simple event and, 27 
tools for counting, 40—51 


Sample-point method, 35—40 


and combinatorial analysis, 
40—51 

examples of using, 36—37, 38 

steps in process of, 36 
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Sample size 
confidence interval, 411—421, 
483-484 
hypothesis test, 496—507, 
520—530 
large, 411-415, 496—507, 553 
likelihood ratio test, 553 
one-way layouts and, 696-698 
randomized block design and, 
696 
selecting, 421—424, 696—698 
small, 425—434, 520—530 
Z-test, 507—510 
Sample Size applet, 352, 373-374 
Sample space, 28, 70 
discrete, 28, 29 
partition of, 70, 71 
Sample variance, 10 
Sampling 
error in repeated, 594 
random, 77—79 
Sampling Distribution of the Mean 
applet, 351 
Sampling distributions, 346—389 
central limit theorem and, 
370—385 
chi-square distributions and, 
356-358 
introductory overview of, 
346—349 
mean, 347, 351, 364 
normal distributions and, 353—369 
sum of squares and, 356 
unbiased point estimator, 393 
variance, 352, 353, 364 
Sampling methods, 77 
matched-pair, 644—651 
random, 77—79 
replacement in, 78 
simple random, 78 
Sampling procedure. See 
Experimental design 
Sampling with/without replacement, 
78 
Scale parameter, 185 
Second-order models, 628—630 
Sensitivity of a test, 73 
Series 
geometric, 67, 835 
logarithmic, 739 
Taylor, 835 
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Sets, 23-25 
complement of, 24 
DeMorgan's laws of, 25 
disjoint, 24 
distributive laws of, 25 
empty, 23 
intersection of, 24 
mutually exclusive, 24 
notation for, 23—26 
null or empty, 23 
subsets of, 23 
union of, 23—24 
universal, 23 
Venn diagrams and, 23-25 
Shape parameter, 185 
Shift model, 743 
Signed-rank test. See Wilcoxon 
signed-rank test 
Significance, statistical versus 
practical, 518 
Significance level, 513-518 
attained, 513-518, 745—746 
Sign test for a matched-pairs 
experiment, 744—750 
attained significance levels for, 
745-746 
Friedman test and, 773 
large sample comparisons and, 
746-747 
Student's t test compared to, 
746-747 
usefulness of, 747 
Simple events, 27 
Simple hypothesis, 541—542 
Simple linear regression model, 
566, 569, 577—583, 589-597 
confidence intervals for, 586, 590, 
591, 596-597 
correlation and, 598—604 
hypothesis tests for, 585, 590 
inferences about linear functions 
in, 589-593 
least-squares estimators for, 571, 
577-583, 610 
matrices and, 610, 613 
predicting values using, 593—597 
Simple random sampling, 78 
Simultaneous confidence 
coefficient, 699—700 
Simultaneous system of linear 
equations 


matrix expression for, 827—829 
solving using matrices, 833—834 
Single-factor experiment, 652 
Size of samples. See Sample size 
Slope, estimating, 643 
Slutsky's theorem, 453 
Small-sample confidence intervals, 
425-434 
Summary, 430 
Small-sample hypothesis testing, 
520-530 
for comparing two population 
means, 523 
for a population mean , 521 
Spearman rank correlation 
coefficient, 783—789 
critical values table, 871 
summary of the test, 786 
Specificity of a test, 73 
SSB. See Sum of squares 
for blocks 
SSE. See Sum of squares for error 
SST. See Sum of squares for 
treatments 
Standard deviation 
confidence bound for, 436 
definition of, 10 
population, 10 
random variable, 93 
sampling distribution of, 348 
sum of squares of deviations and, 
643 
Standard errors 
paired data and, 645 
of point estimators, 397, 399 
Standard normal distribution, 318 
Standard normal random variable, 
181 
Statistic, 346-347 
kth-order, 336 
sufficient. See Sufficient statistic 
Statistical models 
for one-way layouts, 677—679 
for randomized block designs, 
686-687 
See also Models 
Statistical significance, 519 
Statistical tests 
elements of, 489—495 
goodness of, 540, 556 
power of, 540-549 


reporting results of, 513-518 
robustness of, 525, 537 
theory of, 518-519 
See also Hypothesis tests 
Statistics, 347 
definition of, 1-2 
Е, 535-537 
and hypothesis testing, 489 
kth order, 336—337 
minimal sufficient, 465, 467, 471 
nonparametric, 742 
objective of, 2-3 
order, 333—340 
parametric, 742 
sufficient, 459, 461—462 
uses for, 1 
Step functions, 159 
Stochastic events, 20 
Student's ¢ distribution, 359-361, 
523 
See also t distribution 
Student's t Probabilities and 
Quantiles applet, 361, 366, 
522, 525, 526, 586, 601, 
605, 619, 647, 700 
Subsets, 23 
Sufficiency, 459-464 
definition of, 460 
and likelihood, 460-461 
Sufficient statistics, 459, 461—462 
confidence intervals and, 468 
functions of, 465, 470 
minimal, 465, 471 
unbiased estimators and, 464—470 
uses for, 464—465, 468 
Sum of functions, expected value of, 
94—95, 170-171, 258—259 
Summations, formulas for, 835 
Sum of a geometric series, 835 
Sum of squares for blocks (SSB), 
688 
Sum of squares for error (SSE), 570 
ANOVA procedure and, 
662-663, 668-669 
coefficient of determination and, 
601 
complete model and, 624 
formula for, 581, 688 
pooled, 666 
as portion of total sum of squares, 
663 


Sum of squares (Continued) 
reduced model and, 624 
simple linear regression and, 581, 

601 
Sum of squares for independent 
variables, 356 
Sum of squares for treatments 
(SST), 664 
formula for, 688 
rank analogue of, 766, 771 

Sum of squares of deviations 
additivity of, 679-681 
adjusted, 625 
complete model and, 624 
minimizing, 569 
reduced model and, 624 
standard deviation and, 643 
total sum of squares and, 662 

Symmetric difference, 74 


т 
T (test statistic) 
hypothesis tests and, 521, 523, 
585 
multiple linear regression, 618 
simple linear regression, 590 
table of critical values of, 
867—868 
t test and, 521 
Wilcoxon signed-rank test and, 
751, 867-868 
Tables 
analysis of variance, 671—677, 
689, 690 
binomial distribution, 380, 381, 
838-840 
chi-square distribution, 356, 
849-850 
contingency, 721—729 
critical values of T, 867—868 
distribution function of U, 
861—866 
е“, 841 
F distribution, 363, 851—860 
Kruskal—Wallis test, 767 
normal curve areas, 847 
Poisson distribution, 842—846 
random numbers, 872—875 
runs distribution, 869—870 
Spearman rank correlation, 785, 
871 


t distribution, 848 
three-way, 735 
Tables of the Incomplete Beta 
Function (Pearson), 194 
Tables of the Incomplete Gamma 
Function (Pearson), 186 
Target parameter, 391 
Taylor series expansion, 835 
Tchebysheff’s theorem, 18 
bounds for probability in, 401 
continuous random variables and, 
207—210 
discrete random variables and, 
146-149 
error of estimation and, 400—401 
formal statement of, 146, 207 
point estimators and, 450 
uses for, 208, 209 
t density function, 360 
t distribution, 359—361 
degrees of freedom for, 360, 426, 
430, 584 
density function of, 360, 426 
hypothesis testing and, 521 
table of percentage points of, 848 
Testing hypotheses. See Hypothesis 
testing 
Test of homogeneity, 731 
Test statistic 
as element of statistical test, 490 
See also specific test statistics 
Theoretical models, 161 
Theory 
hypothesis testing, 518 
queuing, 143 
reality and, 14 
Three-way tables, 735 
Ties 
in paired experiments, 746, 
750-751, 766 
in rank correlation, 783—784 
Time series, 780—781 
Total observed life, 340 
Total probability law, 70—75 
Total sum of squares, 662—663 
partitioning of, 662, 688 


Transformation method, 298, 310-318 


distribution function method and, 
310-311 

multivariable, 325—333 

summary of, 316 
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Transpose of a matrix, 827 
Treatments, 652, 656, 662 
effect of, 678 
Latin square design, 655 
mean square for, 665, 679—681 
randomized block design, 
654—655, 686 
sum of squares for, 664 
Trials, experimental, 100-101 
t tests, 521 
from the analysis of variance test, 
666 
using least squares estimators, 
565 
sign tests vs., 746—747 
two-sample, 525, 666—667 
usefulness of, 525 
Two-sample shift model, 742—743 
assumptions for, 743 
Two-sample f test, 525, 666—667 
Two-sided confidence interval, 407, 
426 
Two-tailed alternative, 499, 500 
Two-tailed rejection region, 499, 
500 
Two-tailed tests, 499, 584, 751 
p-value for, 514—516 
when to use, 500, 518 
Two-way ANOVA table, 735 
Two-way tables, 735 
Type I errors, 491, 493-494 
probability of, 491, 493 
related to type II errors, 493 
Type П errors, 491, 493—494 
power of tests and, 541 
probability of, 491, 493, 507—510 
related to type I errors, 493 


U 
U (test statistic), 758, 759, 762 
distribution function table, 
861—866 
expected value of, 761—762 
formula for, 758 
variance of, 761—762 
Unbiased confidence interval, 443 
Unbiased point estimators, 392, 393 
consistency of, 450 
minimum-variance, 464—472 
Rao-Blackwell theorem for, 
464—472 


9 
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relative efficiency of, 445 
sample variance as, 398 
sampling distributions for, 393 
simple linear regression and, 577 
unique minimum-variance, 472 


Unconditional probability, 51, 52, 


102 


Unconditional probability function, 


288 


Uncorrelated variables, 265, 267 
Uniform density function, 175 
Uniform distribution, 174—178 


с сес. UG 


ag 


mean, 176, 837 

median, 176 

moment-generating function, 
837 

probability function, 837 

variance, 186, 837 

niformly most powerful test, 
544—546 

niform prior, 817 

niform random variable, 174—176 

nion of events, 57—58 

probability of, 57-58 

nion of sets, 23-24 

nique minimum-variance unbiased 
estimator (UMVUE), 472 


Uniqueness theorem, 318 
Universal set, 23 
Unrestricted maximum-likelihood 


estimator, 551 


Upper confidence bound, 412, 426, 


434, 512 
pper confidence limit, 406 
pper-tail alternative, 497 
pper-tail rejection region, 497 
pper-tail test, 512 


V 


Variables 


Bernoulli, 166, 322, 462, 466 
continuous, 157-158 
dependent, 247, 564 

discrete, 86—87 

dummy, 701 

independent, 247—255, 564 
indicator, 701 


nonrandom, 564 
qualitative, 662 
random, 75—77 
rescaled, 628 

response, 566 

sum of squares for, 663 
uncorrelated, 265, 267 


Variance 


analysis of, 661—712 

beta distribution, 195, 837 

binomial distribution, 106-108, 
836 

chi-square distribution, 837 

comparison of, 361—362, 
533-535 

conditional, 287 

confidence intervals and, 
434—437, 640 

of continuous random variable, 
170-171 

definition of, 10 

discrete random variable, 95—96, 
150 

exponential distribution, 837 

gamma distribution, 186, 837 

geometric distribution, 117-118, 
836 

hypergeometric distribution, 127, 
836 

hypothesis tests and, 530—540 

least-squares estimators, 577—581 

linear functions and, 270-279 

maximum-likelihood estimator 
for, 480 

minimum, 465 

mixed distribution, 213 

negative binomial distribution, 
836 

normal distribution, 353—354, 
837 

of point estimators, 393 

Poisson distribution, 134—135, 
141, 836 

pooled estimator for, 428, 523 

of random variable, 93 

relative efficiency of, 445 

runs test and, 782 


sample, 398 
sampling distribution of, 352, 353 
t distribution, 360 
unbiased estimator for, 577 
uniform distribution, 186, 837 
U test statistic and, 761—762 
See also Analysis of variance 
VarianceSize applet, 353 
Variation 
coefficient of, 387 
measures of, 9 
Vector random variable, 598 
Venn diagrams, 23-25 


W 
W (test statistic), 756-757, 758 
Weibull density function, 219, 317, 
339, 466 
Weibull distribution, 202, 219, 468 
Weibull random variable, 219 
Weighted average, 428 
Wilcoxon, Frank, 755 
Wilcoxon rank-sum test, 755-757, 
758, 762 
Wilcoxon signed-rank test, 
750-755 
critical values of T in, 867 
large samples and, 752-753 
summary of, 751 


Y 


Y value, predicting, 593-597, 
622—624 


Z 
Z (test statistic) 
hypothesis tests and, 500 
large samples and, 500, 747, 752 
Mann-Whitney U test and, 762 
runs test and, 782 
sample size and, 507—510 
sign test and, 747 
Wilcoxon signed-rank test and, 
752—153 
Z-test and, 507 
Zero covariance, 267—268, 284 
Zero probability, 161 


Normal Curve Areas 


Standard normal probability in right-hand Area 
tail (for negative values of z, areas are found by symmetry) 
| 
0 = 
Second decimal place of z 
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 
00 .5000 .4960 4920 .4880 .4840 .4801 .4761 .4721 .4681 .4641 
0.1 .4602 4562 4522 4483 4443 4404 4364 .4325 4286  .4247 
0.2 4207 .4168 4129 .4090 .4052 .4013 .3974 .3936 .3897 .3859 
0.3 .3821 .3783 .3745 .3707 .3660 .3632 .3594 .3557 .3520  .3483 
04 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3156 .3121 
0.5 .3085 .3050 .3015 .2981 .2946 .2912 2877 .2843 4.2810 2776 
0.6 .2743 .2709 .2676 .2643 2611 .2578 .2546 .2514 2483 .2451 
0.7 .2420 .2389 .2358 .2327 2296 2266 2236 .2206 12177  .2148 
0.8  .2119 .2090 .2061 .2033 .2005 .1977 .1949 1922 .1894  .1867 
0.9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 1611 
1.0 .1587 .1562 .1539 .1515 .1492 .1469 .1446 .1423 .1401 .1379 
1.1 .1357 1335 .1314 .1292 .1271 .1251 .1230 .1210 .1190 .1170 
1.22 .1151 .1131 .1112 .1093 .1075 .1056 .1038 .1020 .1003 .0985 
1.3 .0968 .0951 .0934 .0918 .0901 .0885 .0869 .0853 .0838 .0823 
14 .0808 .0793 .0778 .0764 .0749 .0735 .0722 .0708 .0694 .0681 
1.5 .0668 .0655 .0643 .0630 .0618 .0606 .0594 .0582 .0571 .0559 
1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455 
1.7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367 
1.8 .0359 .0352 .0344 .0336 .0329 .0322 .0314 .0307 .0301 .0294 
1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233 
2.0 .0228 .0222 .0217 .0212 .0207 .0202 .0197 .0192 .0188 .0183 
21 .0179 .0174 .0170 .0166 .0162 .0158 .0154 .0150 .0146 .0143 
22 .0139 .0136 .0132 .0129 .0125 .0122 .0119 .0116 .0113 .0110 
23 .0107 .0104 .0102 .0099 .0096 .0094 .0091 .0089 .0087 .0084 
2.4 .0082 .0080 .0078 .0075 .0073 .0071 .0069 .0068 .0066 .0064 
2.5 .0062 .0060 .0059 .0057 .0055 .0054 .0052 .0051 .0049 .0048 
2.6 .0047 .0045 .0044 .0043 .0041 .0040 .0039 .0038 .0037 .0036 
2.7 .0035 .0034 .0033 .0032 .0031 .0030 .0029 .0028 .0027 .0026 
2.8 .0026 .0025 .0024 .0023 .0023 .0022 .0021 .0021 .0020 .0019 
2.9 .0019 .0018 .0017 .0017 .0016 .0016 .0015 .0015 .0014 .0014 
3.0 .00135 
3.5  .000233 
4.0 .0000317 
4.5 .000 003 40 
5.0.000 000 287 


From К. E. Walpole, Introduction to Statistics (New York: Macmillan, 1968). 


Percentage Points of the t Distributions 


Бы 


1100 1050 1025 1010 1005 df 
3.078 6.314 12.706 31.821 63.657 1 
1.886 2.920 4.303 6.965 9.925 2 
1.638 2.353 3.182 4.541 5.841 3 
1.533 2.132 2.776 3.747 4.604 4 
1.476 2.015 2.571 3.365 4.032 5 
1.440 1.943 2.447 3.143 3.707 6 
1.415 1.895 2.365 2.998 3.499 7 
1.397 1.860 2.306 2.896 3.335 8 
1.383 1.833 2.262 2.821 3.250 9 
1.372 1.812 2.228 2.764 3.169 10 
1.363 1.796 2.201 2.718 3.106 11 
1.356 1.782 2.179 2.681 3.055 12 
1.350 1.771 2.160 2.650 3.012 13 
1.345 1.761 2.145 2.624 2.977 14 
1.341 1.753 2,131 2.602 2.947 15 
1.337 1.746 2.120 2.583 2.921 16 
1.333 1.740 2.110 2.567 2.898 17 
1.330 1.734 2.101 2.552 2.878 18 
1.328 1.729 2.093 2.539 2.861 19 
1.325 1.725 2.086 2.528 2.845 20 
1.323 1.721 2.080 2.518 2.831 21 
1.321 1.717 2.074 2.508 2.819 22 
1.319 1.714 2.069 2.500 2.807 23 
1.318 1.711 2.064 2.492 2.797 24 
1.316 1.708 2.060 2.485 2.787 25 
1.315 1.706 2.056 2.479 2.779 26 
1.314 1.703 2.052 2.473 2.771 27 
1.313 1.701 2.048 2.467 2.763 28 
1.311 1.699 2.045 2.462 2.756 29 
1.282 1.645 1.960 2.326 2.576 inf. 


From “Table of Percentage Points of the t-Distribution.” Computed by Maxine 
Merrington, Biometrika, Vol. 32 (1941), p. 300. 


